rust-lang / const-eval

home for proposals in and around compile-time function evaluation
Apache License 2.0
105 stars 17 forks source link

Heap allocations in constants #20

Open oli-obk opened 5 years ago

oli-obk commented 5 years ago

Current proposal/summary: https://github.com/rust-lang/const-eval/issues/20#issuecomment-468657992

Motivation

In order to totally outdo any other constant evaluators out there, it is desirable to allow things like using serde to deserialize e.g. json or toml files into constants. In order to not duplicate code between const eval and runtime, this will require types like Vec and String. Otherwise every type with a String field would either need to be generic and support &str and String in that field, or just outright have a mirror struct for const eval. Both ways seem too restrictive and not in the spirit of "const eval that just works".

Design

Allocating and Deallocating

Allow allocating and deallocating heap inside const eval. This means Vec, String, Box

Final values of constants and statics

If a constant's final value were of type String, and the string is not empty, it would be very problematic to use such a constant:

const FOO: String = String::from("foo");
let x = FOO;
drop(x);
// how do we ensure that we don't run `deallocate`
// on the pointer to the unnamed static containing the bye sequence "foo"?

While there are a few options that could be considered, all of them are very hard to reason about and easy to get wrong. I'm listing them for completeness:

We cannot ban types that contain heap allocations, because

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("foo");
    }
}

const FOO: Foo = Foo;

is perfectly legal stable Rust today. While we could try to come up with a scheme that forbids types that can contain allocations inside, this is ~impossible~ very hard to do.

There's a dynamic way to check whether dropping the value is problematic:

run Drop::drop on a copy of the final value (in const eval), if it tries to deallocate anything during that run, emit an error

Now this seems very dynamic in a way that means changing the code inside a const impl Drop is a breaking change if it causes any deallocations where it did not before. This also means that it's a breaking change to add any allocations to code modifying or creating such values. So if SmallVec (a type not heap allocating for N elements, but allocating for anything beyond that) changes the N, that's a breaking change.

But the rule would give us the best of all worlds:

const A: String = String::new(); // Ok
const B: String = String::from("foo"); // Not OK
const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok

More alternatives? Ideas? Code snippets to talk about?

Current proposal/summary: https://github.com/rust-lang/const-eval/issues/20#issuecomment-468657992

davidhewitt commented 4 years ago

That, or as a first step we just disable memoization.

True! Because the memoization is done using the compiler's query system, the implementation path for that could be something like: 1) Query the const fn evaluation to get the memoized result. 2) If the memoized result is detected to contain a const heap pointer then evaluate the const fn body again to produce a new value (and new const heap allocation)

swarnimarun commented 4 years ago

Has there been any progress on the RFC front?

oli-obk commented 4 years ago

yes. I am working on https://github.com/rust-lang/const-eval/pull/43 that helps us organize which features depend on which other features. Once we have that we can discuss the roadmap for const features and when everyone is on the same page, start working on RFCing these features.

mahkoh commented 3 years ago

Additionally values that contain no pointers to heap allocations are allowed as the final value of a constant.

So mem::transmute::<*const (), usize>() is undefined behavior in const fn?

oli-obk commented 3 years ago

That's a different topic (different as in, already applies right now and is orthogonal to heap allocations), and it only applies to the final value in a constant, not to intermediate state during const eval, so const fn are allowed to do that. Basically const FOO: usize = &42 as *const i32 as usize; is not allowed, because that would mean that using that FOO in a promoted (let x: &usize = &(FOO / 3);) would fail to compile (because taking references to math on constants is automatically promoted), even if it is fine if performed at runtime.

mahkoh commented 3 years ago

One more question: Let's say I have a function fn new() -> T and T stores pointers to heap allocations as usize (maybe because it wants to store additional information in the lower bits.) Let's say that this function has perfectly defined behavior and so do all other operations on T.

If I decide to add const(heap) to the function definition, by what method will you detect that the behavior is now undefined and abort the compilation?

oli-obk commented 3 years ago

As per the current design, you'd have a

const fn new<A: AllocRef>(a: A) -> YourType

and this is perfectly fine, we allow you to do that, even if you store your value in a usize. What is problematic is if you store that in

const FOO: YourType = new(ConstAlloc);

because then, even if that field is private, we have no way of figuring out whether you are going to expose that field to the public in a const way. If you did, there would be a constant of type usize that could be used in promotion, and thus cause an error there. So validation of FOO will fail. Please use a *const () for this case.

mahkoh commented 3 years ago

If you did, there would be a constant of type usize that could be used in promotion, and thus cause an error there.

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

Please use a *const () for this case.

That would be impossible because the pointer would contain an invalid address and validation would thus also fail.

oli-obk commented 3 years ago

That would be impossible because the pointer would contain an invalid address and validation would thus also fail.

A good point. See more below

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

No, you can't do anything with a pointer beyond offset it (miri the tool can do it, but the const evaluator can't, and we don't even have anything at the RFC stage for doing that), and if you offset it out of bounds and put it in the final value of a constant, you get a validation error, too.

All of this is getting very off topic, if you want to discuss this further, please open a post in the internals forum and ping me there. It is entirely unrelated to heap allocations, as you can already do all of this with pointers on the stable compiler today.

mahkoh commented 3 years ago

No, you can't do anything with a pointer beyond offset it

Really? If that is so then I have no concerns. But it was my understanding that transmuting from pointer to int would become possible in the future.

All of this is getting very off topic

Whether, under this scheme, adding const(heap) to a function can cause undefined behavior seems like a perfectly on-topic question.

mahkoh commented 3 years ago

Actually, doesn't this scheme allow you to implement transmute yourself? E.g.

fn transmute(t: T) -> U {
    let a: *mut T = allocate();
    ptr::write(a, t);
    let u = ptr::read(a as *mut U);
    free(a);
    u
}

Pointer casting, ptr::write, and ptr::read are things that occur in regular allocating code. I'm not sure how you're going to prevent pointers from escaping as usize unless you track every bit.

oli-obk commented 3 years ago

There is no const(heap) effect anymore under the currently planned scheme. Everything is happening in the type system via an impl const AllocRef for ConstAlloc. I should have probably updated some things here, but the relevant discussion is in the const-eval zulip and the current plan is summarized in https://hackmd.io/h2O2vkj3RimrBTfm9hvZWA#AllocRef-allocGlobal-and-allocConstGlobal

oli-obk commented 3 years ago

I'm not sure how you're going to prevent pointers from escaping as usize unless you track every bit.

well... we do that. you can't actually modify or read bits of pointers. Pointers are completely abstract and tracked on a different level. There is no way to cheat that system, and we do need that system, because we somehow need to tell LLVM about pointers created during const eval. Again, this is nothing that is special to heap pointers. All that is changed by heap pointers is that you can have pointers to mutable and owned allocations. Right now you can only have pointers to either immutable allocations or borrowed allocations.

mahkoh commented 3 years ago

I'm sorry but I still don't get it. Where is the error in the following code:

const ADDR: usize = {
    let b: Box<i32, ConstGlobal> = Box::new_in(42, ConstGlobal);
    let v: Box<Box<i32, ConstGlobal>, ConstGlobal> = Box::new_in(b, ConstGlobal);
    let p: *const Box<i32, ConstGlobal> = &*v;
    let addr: usize = unsafe { *(p as *const usize) };
    addr
};

Again, this is nothing that is special to heap pointers.

You keep saying this but I've never said that there is anything special about heap pointers. The critical change required by this proposal is that pointer casting and pointer dereferencing must become const.

oli-obk commented 3 years ago

The critical change required by this proposal is that pointer casting and pointer dereferencing must become const.

They already are with a feature gate, you can explore this with any other pointer that you can create during const eval.

Where is the error in the following code:

The error is that you are violating the validation invariant for constants (which is done dynamically once the value of ADDR is computed), which must follow some specific rules in order to make it safe to use constants at all use sites where they can already be used.

const ADDR: usize = &1 as *const i32 as usize;
let x: &'static usize = &(ADDR / 3);

(full working example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=79215db454a590be2cbc43220c10d4a4)

The above creates an invalid constant and cause an error. This is required to make promotion sound. There were many insights into promotion after the RFC was finished and implemented. So I recommend not reading the RFC, since it is very outdated, instead the document from this repo (https://github.com/rust-lang/const-eval/blob/master/promotion.md) has the right details

mahkoh commented 3 years ago

So the answer to

So once a constant is created, you know for each individual bit of that constant if it was computed using the address of a pointer. And if such a bit exists outside of a pointer, an error occurs?

No

was actually Yes. https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=ecd6a47307eaa36bea0ffa0f974b5883

oli-obk commented 3 years ago

Yea, sorry. My mental model of the internals of this probably do not reflect how it appears to work from the outside at all. In essense you can indeed say that we track for each bit whether it came from a pointer, but it is more strict than that.

mahkoh commented 3 years ago

Ok then I see no way to have UB using this method.

But some containers that use the address of a pointer as a number will not work. E.g. the Bytes crate uses this to distinguish between pointers to Rc<[u8]> and pointers to Vec<u8>.

fee1-dead commented 2 years ago

The heap effect proposed by @RalfJung seems like a good solution, but I don't think we should use the const(heap) syntax.

We could add a #[const_heap] attribute with the same semantics of the proposed const(heap) effect.

RalfJung commented 2 years ago

The thing is, we'd also need const(heap) trait bounds, const(heap) function pointers, and const(heap) dyn Trait. Once you have an effect you need to be able to annotate it anywhere that you abstract over code.

fee1-dead commented 2 years ago

We could make types with the const(heap) unnameable (like closures) first. We could enable a lot of use cases even if we cannot name them, I think.

(Also, I think this could be a lang initiative, which could accelerate the development)

Jules-Bertholet commented 2 years ago

There may be a simpler way. If I remember correctly, at some point all heap code in the standard library will be generic over the heap, although default that heap parameter to the currently used system heap. Maybe we can figure out a system with this generic parameter and const trait impls.

Might the Storage trait proposal be relevant here? You could imagine a Storage that acts like a Cow, storing heap allocations made at compile time in static memory but copying and reallocating when that allocation is mutably accessed at runtime. The Storage could be generic over the choice of runtime allocator, allowing custom allocators to be combined with const allocation.

szbergeron commented 2 years ago

Very layman's perspective here, but is there a reason the allocator can't just ignore any static segment that these allocations would exist in? Such that the free(...) impl would simply be a noop for references in that segment. Then drop could run to completion, and the normal "dealloc" code could run on any heap allocations within that type.

bjorn3 commented 2 years ago

AFAIK there is no existing allocator used in the real world which does that. In addition you may choose any custom allocator which doesn't need to support it.