nnethercote / dhat-rs

Heap profiling and ad hoc profiling for Rust programs.
Apache License 2.0
713 stars 36 forks source link

RFC on changes for 0.3 #17

Closed nnethercote closed 2 years ago

nnethercote commented 2 years ago

dhat 0.3 will have several major changes compared to 0.2, including API changes. This issue will serve as a place for discussion on these changes, including suggestions for improvements.

You can try the 0.3.0-pre.1 release by adding this dependency to your Cargo.toml:

dhat = "0.3.0-pre"

The rust docs can be seen here.

Heap profiling basics

With 0.2, a simple heap profiling example looks like this:

use dhat::{Dhat, DhatAlloc};

#[global_allocator]
static ALLOC: DhatAlloc = DhatAlloc;

fn main() {
    let _dhat = Dhat::start_heap_profiling();
    println!("Hello, world!");
}

With 0.3, it looks like this:

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn main() {
    let _profiler = dhat::Profiler::new_heap();
    println!("Hello, world!");
}

Explicit qualification is now preferred to use statements. This avoids the extra use line, which helps avoid warnings (or errors) if you are switching between profiling and non-profiling by commenting/uncommenting the #[global_allocator] declaration and/or the let _profiler line. It's also helpful for the newly added assertions, see below. Dhat is removed from type names to help with this, e.g. DhatAlloc is renamed Alloc, because writing dhat::DhatAlloc is silly.

Dhat is renamed Profiler. This name better reflects what it is; Dhat was meaningless. This also works better with the new ProfilerBuilder type, described below. And the new_heap() function name is a more typical name for a constructor.

Ad hoc profiling basics

With 0.2, a simple ad hoc profiling example looks like this:

use dhat::Dhat;

fn main() {
    let _dhat = Dhat::start_ad_hoc_profiling();
    dhat::ad_hoc_event(100);
}

With 0.3, it looks like this:

fn main() {
    let _profiler = dhat::Profiler::new_ad_hoc();
    dhat::ad_hoc_event(100);
}

Stats

In 0.2, the Stats struct was only intended for internal testing purposes and contained an optional HeapStats value, which was an awkward way of handling the heap/ad hoc split. Some of the field names made sense for heap profiling but not for ad hoc profiling. There was a top-level get_stats() function to get stats.

In 0.3 there is a HeapStats struct and an AdHocStats struct. There are HeapStats::get() and AdHocStats::get() functions for getting stats. The field names all make sense. The fields in these structs are all public, which is arguably bad, but changes are unlikely because that would also require changes to DHAT in Valgrind and the DHAT viewer, and the file format shared by all.

ProfilerBuilder

There are now more configurations for a Profiler, so ProfilerBuilder exists to specify these. (You can use Profiler::new_{heap,ad_hoc} for the most basic cases.)

ProfilerBuilder is a pretty standard use of the builder pattern. Some of the methods are argument-free and set a flag internally, e.g. ad_hoc() and testing(). One might argue that they should have arguments, something like:

But the alternatives (mode(ProfilerKind::Heap) and testing(false)) would never occur in practice because they are the defaults. So it would be unnecessary generality that hurts conciseness.

There are also a couple of hidden methods that are only useful for testing purposes.

Heap usage testing

0.2 could be used in one way: for profiling.

0.3 adds a new use: heap usage testing. You can write a test for your library crate that executes some code, and then write assertions about how much memory was allocated, such as exactly M allocations occurred, or fewer than N bytes were allocated. (Ad hoc usage testing is also possible, but it's less likely to be used.)

Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other. Testing mode must be turned on via ProfilerBuilder::testing(). If that's done, the following occurs.

An example, which should be put within tests/:

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

#[test]
fn test1() {
    let _profiler = dhat::ProfilerBuilder::new().testing().build();

    let _v1 = vec![1, 2, 3, 4];
    let _v2 = vec![5, 6, 7, 8];

    let stats = dhat::HeapStats::get();
    dhat::assert_eq!(stats.curr_blocks, 2);
    dhat::assert_eq!(stats.curr_bytes, 32);
}

Custom assertions are required because they behave differently to std assertions: on failure, they save data to file before panicking with an error message. The newly established preference for avoiding use statements for dhat identifiers means that these will be explicitly qualified, and are unlikely to be confused with the std equivalents.

Panics

0.3 panics in a number of situations that 0.2 did not.

These are scenarios that only occur if the dhat user has done something wrong, and panicking is generally easier—either for dhat itself, or for the user, or both—than trying to handle it gracefully (e.g. ignoring actions, or return an Option).

Backtrace trimming

0.2 would always get full length backtraces. These can be large, e.g. 100+ frames in big programs. Processing these causes a lot of overhead.

0.3 limits the length of backtraces. By default it's 10 "real" frames, but once inline frames get added it's often 20 or more. This is almost always more than enough to work out what's going on. The ProfilerBuilder::trim() method can be used to adjust the length limit, and "unlimited" is a possibility.

It would be better if the length limit was precise, but the getting of the real frames happens when the raw backtrace is obtained, and the inline frame addition happens later when aggregated backtraces are "resolved" using debug info. If the backtraces were resolved upfront the lengths could be precise, but this would greatly increase the amount of resolving that happens, which is very expensive.

There's also an improved mechanism for trimming uninteresting frames at the top and bottom of backtraces. (This is separate to the length limit, and these trimmed frames don't count towards the limit.) This can also be turned off with ProfilerBuilder::trim(), though it's unlikely to be of interest except for dhat testing/development purposes.

Conclusion

I think this is a solid re-design of the APIs, and the heap usage testing is a genuinely useful capability that isn't available in any other way.

I would like to hear feedback about the APIs and the implementation. Reports from people who have tried it out on real-world code are particularly appreciated. Thanks!

Once a certain amount of time has passed (probably 2 weeks) I will make adjustments based on feedback. If there aren't any major issues I will do a 0.3.0 release shortly after that.

matklad commented 2 years ago

(no strong opinion on any of the points)

To try it, add this dependency

consider publishing 0.3.0-pre.1, it’ll make the API even easier to try

Explicit qualification is now preferred to use statements

Love this design!

Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other.

Curious, would it be possible to have some kind of internal lock for this in dhat::Alloc, or would this slow everything else down too much?

ankitson commented 2 years ago

Explicit qualification is now preferred to use statements. This avoids the extra use line, which helps avoid warnings (or errors) if you are switching between profiling and non-profiling by commenting/uncommenting the #[global_allocator] declaration and/or the let _profiler line. It's also helpful for the newly added assertions, see below. Dhat is removed from type names to help with this, e.g. DhatAlloc is renamed Alloc, because writing dhat::DhatAlloc is silly.

Hi, thank you for making this library.

Since you brought up switching between profiling/non-profiling modes:

I have been using this pattern for painlessly switching between profiling & non-profiling:

#[cfg_attr(feature = "heapprofile", global_allocator)]
#[cfg(feature = "heapprofile")]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;

pub fn main() {
    #[cfg(feature = "heapprofile")]
    let _dhat = dhat::Dhat::start_heap_profiling();
    ...

Now cargo run --features heapprofile will run a profile build, cargo run will run a regular build.

I like the new naming and explicit qualification regardless, but what do you think about encoding this pattern either in the library or the docs?

nnethercote commented 2 years ago

consider publishing 0.3.0-pre.1, it’ll make the API even easier to try

Good suggestion. Looks like Cargo will just "do the right thing" with that version string? In that case, I can merge my branch to master and publish 0.3.0-pre.1 and remove my own hosted version of the docs because docs.rs will get the latest docs?

Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other.

Curious, would it be possible to have some kind of internal lock for this in dhat::Alloc, or would this slow everything else down too much?

There is already an internal lock, that's not the issue. Imagine you have two tests.

These two tests run in parallel. Currently, because you can't ever create more than one Profiler, whichever test runs second will just panic with "dhat: profiling started a second time". If that panic was removed, when the second test runs the global allocation count will be at 300 bytes, and its assertion will fail. Because of the parallelism, which test runs second is unpredictable.

It might be possible to make unit tests work with the following changes.

Having to rely on serial_test feels clumsy, but I can see this would be useful, because unit tests are more flexible than integration tests: they can access private stuff, and also work with binaries as well as libraries. Or can you think of a better alternative?

Stats might want to be #[non_exhaustive] just in case.

Ok.

Rust doc shows lifetime parameter for both Profiler and ProfilerBuilder, which I think was intended to be some private detail for testing the crate itself?

Yes, there's a save_to_memory feature specified here and here. It basically says "save the profile to a memory buffer instead of a file when the Profiler is dropped". And the memory buffer comes from outside the Profiler so that the data lives on after the drop, hence the lifetime.

I can see this is confusing to the reader. But it is also very convenient for a couple of the tests. I could just write to file and then have the tests read the file back in, and then delete the file, but that's a bit clumsy.

Is there a way to hide that lifetime in rustdoc? Probably not... but I think I can see a way to restructure things so that the lifetime is avoided.

Consider dhat::Profiler::builder() instead of/in addition to dhat::ProfilerBuilder::new()

Ok. Instead of seems reasonable, the former is shorter so the latter doesn't seem necessary.

trim is a bit ambiguous name, you need docs to learn what is trimmed

True, I'll change it to trim_backtraces.

Thank you for the excellent comments! Some good improvements will come from them.

nnethercote commented 2 years ago

I have been using this pattern for painlessly switching between profiling & non-profiling:

#[cfg_attr(feature = "heapprofile", global_allocator)]
#[cfg(feature = "heapprofile")]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;

pub fn main() {
    #[cfg(feature = "heapprofile")]
    let _dhat = dhat::Dhat::start_heap_profiling();
    ...

Now cargo run --features heapprofile will run a profile build, cargo run will run a regular build.

Nice idea! Thank you for the suggestion. I will add something about it to the docs.

Just to clarify: you have to add [features] heapprofile = [] to your Cargo.toml too, right?

Also I think you can change this part:


#[cfg_attr(feature = "heapprofile", global_allocator)]
#[cfg(feature = "heapprofile")]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;

to something slightly simpler:


#[cfg(feature = "heapprofile")]
#[global_allocator]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;

I just tried that and it works.

nnethercote commented 2 years ago

consider publishing 0.3.0-pre.1, it’ll make the API even easier to try

I have done this:

I have also updated the instructions at the top of this PR accordingly.

matklad commented 2 years ago

Currently, because you can't ever create more than one Profiler, whichever test runs second will just panic with "dhat: profiling started a second time". If that panic was removed, when the second test runs the global allocation count will be at 300 bytes, and its assertion will fail.

Can we replace the will just panic bit with "will block until the first profiler is dropped"? Ah.... even if we can, it requires that both tests create Profile. What we can't help with is the case where one case has a Profiler, and the other has not. We can't have something like "block in alloc, while the other Profile is alive", because, from a different thread, we can't tell if the Profile is genuinely other.

Yeah, I don't see a solution here after all.

matklad commented 2 years ago

but I think I can see a way to restructure things so that the lifetime is avoided.

A nuclear option is to store Arc<Mutex<String>> rather &mut String.

roblabla commented 2 years ago

There is already an internal lock, that's not the issue. Imagine you have two tests.

Test A allocates 100 bytes and asserts this amount.
Test B allocates 200 bytes and asserts this amount.

Couldn't an alternative be to store stats per-thread? This obviously won't work for tests using global thread pools, but it could be enough to test algorithms running on a single thread or spawning their own dedicated threads.

nnethercote commented 2 years ago

Couldn't an alternative be to store stats per-thread? This obviously won't work for tests using global thread pools, but it could be enough to test algorithms running on a single thread or spawning their own dedicated threads.

Interesting idea, but I can't see how it would work.

So I think the notion of per-thread stats isn't workable.

Time-based splitting (i.e. serialization) of profiling tests seems the only way forward. serial_test is one way. @matklad's "will block until the first profiler is dropped" suggestion is another. Neither are fool-proof; they both require the user to write their tests in a particular way to avoid unpredictable test failures. But "you can write multiple units tests with some care" is better than "you can't write unit tests at all", so I would like to achieve something here.

nnethercote commented 2 years ago

Progress update: I will be working through the suggested changes. The ones relating to non-exhaustiveness, naming, lifetimes, and feature flags are simple. The one about enabling units tests will take some more thought. There will be a 0.3.0-pre.2 release at some point incorporating these changes, but it's likely to be in early 2022 after the holiday period.

nnethercote commented 2 years ago

Time-based splitting (i.e. serialization) of profiling tests seems the only way forward. serial_test is one way. @matklad's "will block until the first profiler is dropped" suggestion is another. Neither are fool-proof

https://github.com/palfrey/serial_test/issues/42 is interesting. It suggests using an RwLock in tests. Heap usage tests would get a write lock, and thus run in serial. Non heap usage tests would get a read lock, and so can run together but not at the same time as the heap usage tests.

nnethercote commented 2 years ago

I tried doing unit tests. Unfortunately, making unit test execution serial (e.g. with RwLock or the serial_test crate) isn't enough, because run_tests (the unit test running code) is doing stuff in parallel with the unit tests. I guess it probably does its own stuff on one thread, and then spins off a thread for each unit test. And the stuff it's doing in parallel includes allocations, so run_test's allocations get mixed in with each unit test's allocations, which means the counts are all wrong.

If you run cargo test -- --test-threads 1 then everything is single threaded and unit tests will work straightforwardly, without any need for explicit unit test serialization. That's the only way I can see that unit tests can be supported.

I guess if you really want unit tests, you can add a dhat-heap feature to your crate, mark all your heap usage unit tests with cfg(dhat-heap), mark all other unit tests with cfg(not(dhat-heap)), and then you can run the former with cargo test --features dhat-heap -- --test-threads 1 and the latter with cargo test.

nnethercote commented 2 years ago

I just published 0.3.0-pre.2, which incorporates the suggestions from @matklad and @ankitson.

nnethercote commented 2 years ago

I have published 0.3.0! Thanks to everyone for the feedback.