Feature: intrinsics to access allocation info for self-test

Diggsey commented 3 years ago

Miri already tracks allocations and deallocations. It would be convenient if, from a test, I could make assertions about those allocations.

For example, I might want to assert that a particular block of code performs no allocations. Or that it performs one allocation of a specific size and alignment.

RalfJung commented 3 years ago

Given that Miri now supports concurrency, this is actually non-trivial -- I think we would have to remember which thread allocates which memory, and then also take some form of snapshot at the beginning of the relevant block of code to see what changed at the end of it. Or what kind of implementation strategy did you have in mind?

Also this seems like something you could test without Miri without too much hassle, by using a custom allocator?

Diggsey commented 3 years ago

@RalfJung this is exactly what I did: https://docs.rs/mockalloc/0.1.0/mockalloc/

The problem is that I have to disable these checks when the tests are run under miri because they can't be used in conjunction. It would be simpler if I could just directly access this information from within the test and then always use miri.

The simplest way would be to offer something similar to the record_allocs function from that crate, which just works on the current thread.

oli-obk commented 3 years ago

The reason you need to disable those tests is because https://github.com/rust-lang/miri/issues/1207 is not implemented yet, so mockalloc won't work with miri?

Diggsey commented 3 years ago

Yeah, but I figure if miri is tracking this information anyway, it's redundant to track it twice.

RalfJung commented 3 years ago

I don't think Miri is tracking the right information for you though. Sure, it has a list of all allocations globally, but it has no way to tell where they have been created, and no way to check if a piece of code added more of them. Also notice that "allocation" in Miri includes "stack variable", so just counting how many allocations there are will not be useful.

The bugs listed in mockalloc, however, should all be detected by Miri already.

Basically, there is an interesting jump here:

It would be convenient if, from a test, I could make assertions about those allocations.

So this would be things like "total number of allocations", "total size", right? That would be easy, but I do not see how it would be useful. But given a usecase I'd not be opposed to adding something like this.

For example, I might want to assert that a particular block of code performs no allocations.

This is not an assertion about the allocations, it is an assertion about the code! Could you describe what exactly you'd like to query from Miri to enable such a check?

Diggsey commented 3 years ago

This is not an assertion about the allocations, it is an assertion about the code!

It's an assertion about what allocations the code performs, but I think you get the idea 😛

Also notice that "allocation" in Miri includes "stack variable"

Ah OK.

Could you describe what exactly you'd like to query from Miri to enable such a check?

Some tests I'd like to write (I'll use Vec as an example since it's similar to my type):

That Vec::with_capacity(0) does not perform any allocations.
That calling Vec::push N times performs log(N) allocations.
That calling Vec::<T>::with_capacity(N) for various combinations of T and N allocates with the expected size and alignment (this Vec type stores its length and capacity inside the allocation, so that layout calculation is non-trivial).
That collect()ing from an iterator that supports size_hint() performs the expected number of allocations.
That IntoIter frees the Vec's memory once the last item has been returned from next() even if the IntoIter iterator itself has not been dropped.

Some of these tests can be done using mockalloc (when not running under miri) but none of them can be done with just miri.

oli-obk commented 3 years ago

This is amusing timing, but I think the following documentation-PR is also relevant here: https://github.com/rust-lang/rust/pull/79045

Anyway, tracking heap allocations is a bit harder than tracking regular allocations, since we actually have to implement some logic that runs through the list of allocations and checks their allocation-kind. This could be quite expensive if there are a lot of live allocations in general, but for testing that's ok.

I personally would like to see some miri specific introspection functions for querying various aspects of the interpreter state for testing.

RalfJung commented 3 years ago

It's an assertion about what allocations the code performs, but I think you get the idea stuck_out_tongue

Well, I get the idea of what you want at the top-level, but I am not actually sure which APIs you think Miri should expose for this. It would really help if you could draft a concrete example for the kind of code you'd like to write, or (even better) a concrete API that Miri should expose, with a precise spec. Right now I'd have to guess what it is you need/want. Since you probably already thought about this, it seems silly that I should re-do that design work.^^

Asking about the current state of the allocator, and asking about the allocations performed by some piece of code, are related but clearly distinct operations, after all.

Diggsey commented 3 years ago

This is amusing timing, but I think the following documentation-PR is also relevant here: rust-lang/rust#79045

All the more reason to run these kinds of tests under miri, where we can be sure these optimizations are not happening.

Since you probably already thought about this, it seems silly that I should re-do that design work.^^

I linked you to the API for mockalloc: the record_allocs function is the kind of high level API I'd want to work with.

However, I would not expect that high level API to necessarily be directly exposed by miri. It might make more sense for miri to provide lower level building blocks such that people can build these higher level APIs on top. As for what those lower level APIs look like: I don't have a design ready to give you, and I would expect that design to be more heavily influenced by the way miri works internally, than my specific requirements.

I have a few ideas:

The basic building block could be a "snapshot". A single intrinsic would be provided to generate such a snapshot. It would then be up to library code to take two snapshots and compare them to determine what a particular section of code does. Under this model it would be necessary to tag allocations with the ID of the thread that allocated them.
The basic building block could be an "event stream". You might have start/stop recording intrinsics, or it could be always recording and the intrinsic would ask for all events within a time interval. Events would need to be tagged with a thread ID.
The intrinsics could allow directly querying the miri data-structures (ie. no snapshot capability). It would be up to the library code to avoid doing things (like allocating) that would disrupt its own data collection.

RalfJung commented 3 years ago

I linked you to the API for mockalloc: the record_allocs function is the kind of high level API I'd want to work with.

Oh, I thought you wanted more than what mockalloc provides.

I think it'd make more sense (and be easier!) to fix https://github.com/rust-lang/miri/issues/1207 than to re-implement mockalloc.

rust-lang / miri

Feature: intrinsics to access allocation info for self-test #1625