rust-lang / miri

An interpreter for Rust's mid-level intermediate representation
Apache License 2.0
4.33k stars 329 forks source link

intptrcast (with many different seeds) does not explore all possible executions #866

Open RalfJung opened 5 years ago

RalfJung commented 5 years ago

The following assertion is currently guaranteed to pass in Miri, no matter the seed:

fn main() {
  let ptr1 = Box::into_raw(Box::new(0)) as usize;
  let ptr2 = Box::into_raw(Box::new(0)) as usize;
  assert!(ptr2 > ptr1);
}

This is because we basically have a "bump allocator" as our implementation of int-to-ptr casts. Ideally, when exploring many seeds, all possible allocation patterns should arise; but at least we could try to find a way to be a bit less predictable than we currently are, so that failing the assertion above becomes a possibility.

oli-obk commented 5 years ago

do you want to just enable reuse, so a long running program will slowly start (deterministically) shuffling its allocations?

Or do you really want randomization? I mean your code snippet will pass when running natively ;)

RalfJung commented 5 years ago

Reuse would be another dimension I did not even consider yet. ;)

I mean your code snippet will pass when running natively ;)

It might pass. Or you might use an allocator I could quickly hack together that makes this not pass. ;)

One way to make this particular test behave properly (as in, pass with some seeds and fail with others) is to start somewhere in the middle of the address space, and then for each new allocation randomly decide if we allocate at the "upper" or the "lower" end. But that will still not explore all possible executions, even if we assume addresses never get reused.

oli-obk commented 1 year ago

I guess we could use https://github.com/rust-osdev/linked-list-allocator on the [u8] that is our virtual memory so we'd at least get reuse without having to do much work. We can then figure out randomization later.

RalfJung commented 1 year ago

We could get reuse pretty easily by adjusting our integer address assignment logic, I think. Though this will incur fragmentation so it could be too expensive...

saethlin commented 1 year ago

What would fragment? The actual memory in the allocations is stored wherever the allocator linked into Miri decides to put it, right?

And in any case, I suspect a minority of our cycles are spent actually accessing the bytes in allocations, so even if we fragmented them I think it wouldn't be that severe an issue.

RalfJung commented 1 year ago

I was thinking of Miri's virtually memory. To find an address for a new allocation, what do we do? Iterate over the list of existing allocations to see if there is a gap somewhere to fit this in?

Of course we could also go all-in: pick a random address and see if that works or overlaps. Give up after 1000 attempts or so.

saethlin commented 1 year ago

Yeah, I think randomization is the way to go to. We don't need to pack allocations densely like a real allocator. In addition to that, we can keep a fixed-size queue of recent deallocations and consult that first to increase our chance of reusing a recently-deallocated address.

saethlin commented 1 year ago

I think there's a much easier way to do this: Carve up the whole address space ahead of time into N different bump allocators, and serve allocations by choosing randomly between them. This would produce address space exhaustion much faster on 32-bit targets, but it should be relatively easy to implement by reusing our existing code.

RalfJung commented 1 year ago

That would make things more random, so it would help, yes. Fundamentally there would still be correlations that are not guaranteed, though.