Closed nnethercote closed 2 years ago
(no strong opinion on any of the points)
To try it, add this dependency
consider publishing 0.3.0-pre.1
, it’ll make the API even easier to try
Explicit qualification is now preferred to use statements
Love this design!
Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other.
Curious, would it be possible to have some kind of internal lock for this in dhat::Alloc
, or would this slow everything else down too much?
Stats
might want to be #[non_exhaustive]
just in case.dhat::Profiler::builder()
instead of/in addition to dhat::ProfilerBuilder::new()
trim
is a bit ambiguous name, you need docs to learn what is trimmedExplicit qualification is now preferred to use statements. This avoids the extra use line, which helps avoid warnings (or errors) if you are switching between profiling and non-profiling by commenting/uncommenting the #[global_allocator] declaration and/or the let _profiler line. It's also helpful for the newly added assertions, see below. Dhat is removed from type names to help with this, e.g. DhatAlloc is renamed Alloc, because writing dhat::DhatAlloc is silly.
Hi, thank you for making this library.
Since you brought up switching between profiling/non-profiling modes:
I have been using this pattern for painlessly switching between profiling & non-profiling:
#[cfg_attr(feature = "heapprofile", global_allocator)]
#[cfg(feature = "heapprofile")]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;
pub fn main() {
#[cfg(feature = "heapprofile")]
let _dhat = dhat::Dhat::start_heap_profiling();
...
Now cargo run --features heapprofile
will run a profile build, cargo run
will run a regular build.
I like the new naming and explicit qualification regardless, but what do you think about encoding this pattern either in the library or the docs?
consider publishing
0.3.0-pre.1
, it’ll make the API even easier to try
Good suggestion. Looks like Cargo will just "do the right thing" with that version string? In that case, I can merge my branch to master
and publish 0.3.0-pre.1
and remove my own hosted version of the docs because docs.rs
will get the latest docs?
Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other.
Curious, would it be possible to have some kind of internal lock for this in
dhat::Alloc
, or would this slow everything else down too much?
There is already an internal lock, that's not the issue. Imagine you have two tests.
These two tests run in parallel. Currently, because you can't ever create more than one Profiler
, whichever test runs second will just panic with "dhat: profiling started a second time". If that panic was removed, when the second test runs the global allocation count will be at 300 bytes, and its assertion will fail. Because of the parallelism, which test runs second is unpredictable.
It might be possible to make unit tests work with the following changes.
serial_test
crate and mark each unit test with #[serial]
to enforce serialization.Profiler
is permitted and resets the global stats. The first Profiler
would have to have been dropped; I would still disallow having two Profiler
s alive at once. And maybe only allow this in testing mode.Having to rely on serial_test
feels clumsy, but I can see this would be useful, because unit tests are more flexible than integration tests: they can access private stuff, and also work with binaries as well as libraries. Or can you think of a better alternative?
Stats
might want to be#[non_exhaustive]
just in case.
Ok.
Rust doc shows lifetime parameter for both Profiler and ProfilerBuilder, which I think was intended to be some private detail for testing the crate itself?
Yes, there's a save_to_memory
feature specified here and here. It basically says "save the profile to a memory buffer instead of a file when the Profiler
is dropped". And the memory buffer comes from outside the Profiler
so that the data lives on after the drop, hence the lifetime.
I can see this is confusing to the reader. But it is also very convenient for a couple of the tests. I could just write to file and then have the tests read the file back in, and then delete the file, but that's a bit clumsy.
Is there a way to hide that lifetime in rustdoc? Probably not... but I think I can see a way to restructure things so that the lifetime is avoided.
Consider
dhat::Profiler::builder()
instead of/in addition todhat::ProfilerBuilder::new()
Ok. Instead of seems reasonable, the former is shorter so the latter doesn't seem necessary.
trim
is a bit ambiguous name, you need docs to learn what is trimmed
True, I'll change it to trim_backtraces
.
Thank you for the excellent comments! Some good improvements will come from them.
I have been using this pattern for painlessly switching between profiling & non-profiling:
#[cfg_attr(feature = "heapprofile", global_allocator)] #[cfg(feature = "heapprofile")] static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc; pub fn main() { #[cfg(feature = "heapprofile")] let _dhat = dhat::Dhat::start_heap_profiling(); ...
Now
cargo run --features heapprofile
will run a profile build,cargo run
will run a regular build.
Nice idea! Thank you for the suggestion. I will add something about it to the docs.
Just to clarify: you have to add [features] heapprofile = []
to your Cargo.toml
too, right?
Also I think you can change this part:
#[cfg_attr(feature = "heapprofile", global_allocator)] #[cfg(feature = "heapprofile")] static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;
to something slightly simpler:
#[cfg(feature = "heapprofile")] #[global_allocator] static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;
I just tried that and it works.
consider publishing 0.3.0-pre.1, it’ll make the API even easier to try
I have done this:
I have also updated the instructions at the top of this PR accordingly.
Currently, because you can't ever create more than one Profiler, whichever test runs second will just panic with "dhat: profiling started a second time". If that panic was removed, when the second test runs the global allocation count will be at 300 bytes, and its assertion will fail.
Can we replace the will just panic
bit with "will block until the first profiler is dropped"? Ah.... even if we can, it requires that both tests create Profile. What we can't help with is the case where one case has a Profiler
, and the other has not. We can't have something like "block in alloc, while the other Profile is alive", because, from a different thread, we can't tell if the Profile is genuinely other.
Yeah, I don't see a solution here after all.
but I think I can see a way to restructure things so that the lifetime is avoided.
A nuclear option is to store Arc<Mutex<String>>
rather &mut String
.
There is already an internal lock, that's not the issue. Imagine you have two tests.
Test A allocates 100 bytes and asserts this amount. Test B allocates 200 bytes and asserts this amount.
Couldn't an alternative be to store stats per-thread? This obviously won't work for tests using global thread pools, but it could be enough to test algorithms running on a single thread or spawning their own dedicated threads.
Couldn't an alternative be to store stats per-thread? This obviously won't work for tests using global thread pools, but it could be enough to test algorithms running on a single thread or spawning their own dedicated threads.
Interesting idea, but I can't see how it would work.
So I think the notion of per-thread stats isn't workable.
Time-based splitting (i.e. serialization) of profiling tests seems the only way forward. serial_test
is one way. @matklad's "will block until the first profiler is dropped" suggestion is another. Neither are fool-proof; they both require the user to write their tests in a particular way to avoid unpredictable test failures. But "you can write multiple units tests with some care" is better than "you can't write unit tests at all", so I would like to achieve something here.
Progress update: I will be working through the suggested changes. The ones relating to non-exhaustiveness, naming, lifetimes, and feature flags are simple. The one about enabling units tests will take some more thought. There will be a 0.3.0-pre.2 release at some point incorporating these changes, but it's likely to be in early 2022 after the holiday period.
Time-based splitting (i.e. serialization) of profiling tests seems the only way forward. serial_test is one way. @matklad's "will block until the first profiler is dropped" suggestion is another. Neither are fool-proof
https://github.com/palfrey/serial_test/issues/42 is interesting. It suggests using an RwLock
in tests. Heap usage tests would get a write lock, and thus run in serial. Non heap usage tests would get a read lock, and so can run together but not at the same time as the heap usage tests.
I tried doing unit tests. Unfortunately, making unit test execution serial (e.g. with RwLock
or the serial_test
crate) isn't enough, because run_tests
(the unit test running code) is doing stuff in parallel with the unit tests. I guess it probably does its own stuff on one thread, and then spins off a thread for each unit test. And the stuff it's doing in parallel includes allocations, so run_test
's allocations get mixed in with each unit test's allocations, which means the counts are all wrong.
If you run cargo test -- --test-threads 1
then everything is single threaded and unit tests will work straightforwardly, without any need for explicit unit test serialization. That's the only way I can see that unit tests can be supported.
I guess if you really want unit tests, you can add a dhat-heap
feature to your crate, mark all your heap usage unit tests with cfg(dhat-heap)
, mark all other unit tests with cfg(not(dhat-heap))
, and then you can run the former with cargo test --features dhat-heap -- --test-threads 1
and the latter with cargo test
.
I just published 0.3.0-pre.2, which incorporates the suggestions from @matklad and @ankitson.
I have published 0.3.0! Thanks to everyone for the feedback.
dhat
0.3 will have several major changes compared to 0.2, including API changes. This issue will serve as a place for discussion on these changes, including suggestions for improvements.You can try the
0.3.0-pre.1
release by adding this dependency to yourCargo.toml
:The rust docs can be seen here.
Heap profiling basics
With 0.2, a simple heap profiling example looks like this:
With 0.3, it looks like this:
Explicit qualification is now preferred to
use
statements. This avoids the extrause
line, which helps avoid warnings (or errors) if you are switching between profiling and non-profiling by commenting/uncommenting the#[global_allocator]
declaration and/or thelet _profiler
line. It's also helpful for the newly added assertions, see below.Dhat
is removed from type names to help with this, e.g.DhatAlloc
is renamedAlloc
, because writingdhat::DhatAlloc
is silly.Dhat
is renamedProfiler
. This name better reflects what it is;Dhat
was meaningless. This also works better with the newProfilerBuilder
type, described below. And thenew_heap()
function name is a more typical name for a constructor.Ad hoc profiling basics
With 0.2, a simple ad hoc profiling example looks like this:
With 0.3, it looks like this:
Stats
In 0.2, the
Stats
struct was only intended for internal testing purposes and contained an optionalHeapStats
value, which was an awkward way of handling the heap/ad hoc split. Some of the field names made sense for heap profiling but not for ad hoc profiling. There was a top-levelget_stats()
function to get stats.In 0.3 there is a
HeapStats
struct and anAdHocStats
struct. There areHeapStats::get()
andAdHocStats::get()
functions for getting stats. The field names all make sense. The fields in these structs are all public, which is arguably bad, but changes are unlikely because that would also require changes to DHAT in Valgrind and the DHAT viewer, and the file format shared by all.ProfilerBuilder
There are now more configurations for a
Profiler
, soProfilerBuilder
exists to specify these. (You can useProfiler::new_{heap,ad_hoc}
for the most basic cases.)ProfilerBuilder
is a pretty standard use of the builder pattern. Some of the methods are argument-free and set a flag internally, e.g.ad_hoc()
andtesting()
. One might argue that they should have arguments, something like:kind(ProfilerKind::AdHoc)
instead ofad_hoc()
testing(true)
instead oftesting()
But the alternatives (
mode(ProfilerKind::Heap)
andtesting(false)
) would never occur in practice because they are the defaults. So it would be unnecessary generality that hurts conciseness.There are also a couple of hidden methods that are only useful for testing purposes.
Heap usage testing
0.2 could be used in one way: for profiling.
0.3 adds a new use: heap usage testing. You can write a test for your library crate that executes some code, and then write assertions about how much memory was allocated, such as exactly M allocations occurred, or fewer than N bytes were allocated. (Ad hoc usage testing is also possible, but it's less likely to be used.)
Integration tests must be used, with one test per file. This is because there is global state involved, and multiple tests in the same process would interfere with each other. Testing mode must be turned on via
ProfilerBuilder::testing()
. If that's done, the following occurs.Profiler
drop is disabled.dhat::assert!
,dhat::assert_eq!
, anddhat::assert_ne!
become available for use. (They panic immediately if used while not in testing mode.)An example, which should be put within
tests/
:Custom assertions are required because they behave differently to
std
assertions: on failure, they save data to file before panicking with an error message. The newly established preference for avoidinguse
statements fordhat
identifiers means that these will be explicitly qualified, and are unlikely to be confused with thestd
equivalents.Panics
0.3 panics in a number of situations that 0.2 did not.
Profiler
is created. (Previously it was ignored.)Profiler
isn't running. (Previously it was ignored.)dhat
assertion is called while aProfiler
isn't running, or if the runningProfiler
isn't in test mode. (These scenarios didn't exist in 0.2.)These are scenarios that only occur if the
dhat
user has done something wrong, and panicking is generally easier—either fordhat
itself, or for the user, or both—than trying to handle it gracefully (e.g. ignoring actions, or return anOption
).Backtrace trimming
0.2 would always get full length backtraces. These can be large, e.g. 100+ frames in big programs. Processing these causes a lot of overhead.
0.3 limits the length of backtraces. By default it's 10 "real" frames, but once inline frames get added it's often 20 or more. This is almost always more than enough to work out what's going on. The
ProfilerBuilder::trim()
method can be used to adjust the length limit, and "unlimited" is a possibility.It would be better if the length limit was precise, but the getting of the real frames happens when the raw backtrace is obtained, and the inline frame addition happens later when aggregated backtraces are "resolved" using debug info. If the backtraces were resolved upfront the lengths could be precise, but this would greatly increase the amount of resolving that happens, which is very expensive.
There's also an improved mechanism for trimming uninteresting frames at the top and bottom of backtraces. (This is separate to the length limit, and these trimmed frames don't count towards the limit.) This can also be turned off with
ProfilerBuilder::trim()
, though it's unlikely to be of interest except fordhat
testing/development purposes.Conclusion
I think this is a solid re-design of the APIs, and the heap usage testing is a genuinely useful capability that isn't available in any other way.
I would like to hear feedback about the APIs and the implementation. Reports from people who have tried it out on real-world code are particularly appreciated. Thanks!
Once a certain amount of time has passed (probably 2 weeks) I will make adjustments based on feedback. If there aren't any major issues I will do a 0.3.0 release shortly after that.