Tracking Issue for strict_provenance

Gankra commented 2 years ago

Feature gate: #![feature(strict_provenance)]

This is a tracking issue for the strict_provenance feature. This is a standard library feature that governs the following APIs:

IMPORTANT: This is purely a set of library APIs to make your code more clear/reliable, so that we can better understand what Rust code is actually trying to do and what it actually needs help with. It is overwhelmingly framed as a memory model because we are doing a bit of Roleplay here. We are roleplaying that this is a real memory model and seeing what code doesn't conform to it already. Then we are seeing how trivial it is to make that code "conform".

This cannot and will not "break your code" because the lang and compiler teams are wholy uninvolved with this. Your code cannot be "run under strict provenance" because there isn't a compiler flag for "enabling" it. Although it would be nice to have a lint to make it easier to quickly migrate code that wants to play along.

This is an unofficial experiment to see How Bad it would be if Rust had extremely strict pointer provenance rules that require you to always dynamically preserve provenance information. Which is to say if you ever want to treat something as a Real Pointer that can be Offset and Dereferenced, there must be an unbroken chain of custody from that pointer to the original allocation you are trying to access using only pointer->pointer operations. If at any point you turn a pointer into an integer, that integer cannot be turned back into a pointer. This includes usize as ptr, transmute, type punning with raw pointer reads/writes, whatever. Just assume the memory "knows" it contains a pointer and that writing to it as a non-pointer makes it forget (because this is quite literally true on CHERI and miri, which are immediate beneficiaries of doing this).

A secondary goal of this project is to try to disambiguate the many meanings of ptr as usize, in the hopes that it might make it plausible/tolerable to allow usize to be redefined to be an address-sized integer instead of a pointer-sized integer. This would allow for Rust to more natively support platforms where sizeof(size_t) < sizeof(intptr_t), and effectively redefine usize from intptr_t to size_t/ptrdiff_t/ptraddr_t (it would still generally conflate those concepts, absent a motivation to do otherwise). To the best of my knowledge this would not have a practical effect on any currently supported platforms, and just allow for more platforms to be supported (certainly true for our tier 1 platforms).

A tertiary goal of this project is to more clearly answer the question "hey what's the deal with Rust on architectures that are pretty harvard-y like AVR and WASM (platforms which treat function pointers and data pointers non-uniformly)". There is... weirdness in the language because it's difficult to talk about "some" function pointer generically/opaquely and that encourages you to turn them into data pointers and then maybe that does Wrong Things.

The mission statement of this experiment is: assume it will and must work, try to make code conform to it, smash face-first into really nasty problems that need special consideration, and try to actually figure out how to handle those situations. We want the evil shit you do with pointers to work but the current situation leads to incredibly broken results, so something has to give.

Public API

This design is roughly based on the article Rust's Unsafe Pointer Types Need An Overhaul, which is itself based on the APIs that CHERI exposes for dynamically maintaining provenance information even under Fun Bit Tricks.

The core piece that makes this at all plausible is pointer::with_addr(self, usize) -> Self which dynamically re-establishes the provenance chain of custody. Everything else introduced is sugar or alternatives to as casts that better express intent.

More APIs may be introduced as we explore the feature space.

// core::ptr
pub fn invalid<T>(addr: usize) -> *const T;
pub fn invalid_mut<T>(addr: usize) -> *mut T;

// core::pointer
pub fn addr(self) -> usize;
pub fn with_addr(self, addr: usize) -> Self;
pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;

Steps / History

[x] Implementation: #95241
[ ] Final comment period (FCP)
[ ] Stabilization PR

Unresolved Questions

How Bad Is This?
How Good Is This?
What's Problematic (And Should Work)?
- [ ] Hardcoded MMIO address stuff
  - We should define a platform-specific way to do this, possibly requiring that you only use volatile access
- [ ] Opaque Function Pointers - architectures like AVR and WASM treat function pointers special, they're normal pointers.
  - We should really define a #[repr(transparent)] OpaqueFnPtr(fn() -> ()) type in std, need a way to talk about e.g. dlopen.
- [ ] libc interop for bad APIs that pun integers and pointers
  - Use a union to make the pun explicit?
- [ ] passing shared pointers over IPC?
  - At worst you can rederive from your SHMEM?
- [ ] downcasting to subclasses?
  - Would be nice if you could create a reference without shrinking its provenance to allow for ergonomic references to a baseclass that can be (unsafely) cast to a reference to a subclass.
- [ ] memcpy operations conceptually say "all this memory is just u8's" which would trash provenance
  - it's pretty standard to carve out exceptions for memcpy, but it would be good to know if this can be done more rigorously with something like llvm's proposed byte type
- [ ] AtomicPtr - AtomicPtr has a very limited API, so lots of people use AtomicUsize to do the equivalent of wrapping_add
  - Morally this is fine, unclear if the right compiler intrinsics exist to express this without "dropping" provenance.
What's Problematic (And Might Be Impossible)?
- [ ] High-bit Tagging - rustc::ty does this because it makes common addressing modes Free Untagging Realestate
  - Technically this is "fine" but CHERI might get upset about it, needs investigation.
- [ ] Pointer Compression - V8 and JVM like compressing pointers, involving massive truncations.
  - Can a Sufficiently Smart Union handle this?
- [ ] Unrestricted XOR-list - XORing pointers to make an even more jacked up linked list
  - You must allocate all your nodes in a Vec/Arena to be able to reconstitute ptrs. At that point, use indices.
APIs We Want To Add/Change?
- A lot of uses of .addr() are for alignment checks, .is_aligned(), .is_aligned_to(usize)?
- An API to make ZST alloc forging explicit, exists_zst(usize)?
- .addr() should arguably work on a DST, if you use .addr() you are ostensibly saying "I know this doesn't roundtrip"
- Explicit conveniences for low-bit tagging? .with_tag(TAG)?
- expose_addr/from_exposed_addr are slightly unfortunate names since it's not the address that gets exposed, it's the provenance. What would be better names? Please discuss on Zulip.
- It is somewhat unfortunate that addr is the short and easy name for the operation that programmers likely expect less. (Many will expect expose_addr semantics.) Maybe it should have a different name. But which name?

jrtc27 commented 2 years ago

Can we please not be stupid like this? There are 26 participants in this issue, I assume at least most are still subscribed, and specific aspects of this tracking issue have already been split out into distinct issues/PRs to keep the length of this issue manageable. I don't want an email for every silly name people come up with (some of which, I will add, are totally backwards), but I do want an email for actual productive/technical comments.

That is: if you want to bikeshed silly names that will never be picked, take it to a different thread.

khionu commented 2 years ago

@jrtc27 While you are in the right for wanting the conversation to be redirected to a more suitable venue, it'd be better if you didn't bring intelligence into the matter. I'd encourage you to focus on constructive, and succinct phrasing; simply asking "can we take this to a more appropriate venue?" would communicate everything necessary.

mvtec-bergdolll commented 2 years ago

Obviously names are the easiest thing to bikeshed, but names are also really important. Because often if you can't come up with a good name for something, it means you don't properly understand what you are trying to do.

expose_addr/from_exposed_addr are slightly unfortunate names since it's not the address that gets exposed, it's the provenance. What would be better names?

I tried to give some ideas for a specifically asked question, that no one else had answered here before.

ChayimFriedman2 commented 2 years ago

expose_addr/from_exposed_addr are slightly unfortunate names since it's not the address that gets exposed, it's the provenance.

As a counterargument, addr()/expose_addr() is a nice symmetry. Maybe get_addr_and_expose_provenance() :P

If we could, I think it would be better to have two functions, fn expose_provenance(self) -> () and fn addr(self) -> usize. It makes it obvious that exposing a provenance is an operation with side-effects, and make all casts pure. The problem is that as already does both of these operations (although even that won't solve the name question for from_exposed_addr() - from_addr_with_exposed_provenance() is too long).

Permik commented 2 years ago

Please forgive me for chipping in my 2¢ on the name bikeshedding.

There's been many mentions on what to name the action when you "magically" create a suitable provenance. I think we already have a perfectly suitable name for it, inference/infer. I think that infer_provenance could succinctly imply that something automagical/intelligent is taking place. The infer word in itself is not absolute, so as a consept it leaves a chance for the provenance inference process to fail on platforms where it's not supported.

Alas, there might be a counterargument that the infer verb is already overused, bc of type and lifetime inference, but IMO provenance fits right beside the already used terms.

expose_provenance()/infer_provenance() They seem the most logical names as they seem pretty self descriptive, reuse prior art, and have the most important search term right in the method name.

ChayimFriedman2 commented 2 years ago

@Permik But to "infer" something is to understand it based on implicit data/reasoning. Provenance is explicit. Each pointer has provenance, it's just invisible. It's somewhat orthogonal to how we get the length out of a slice pointer: we don't infer it, we just extract it, as it's already sitting there. In the same way, you can look at every pointer as a pair of (address, provenance).

iago-lito commented 2 years ago

If it's exactly here but invisible, and we are just making it something tangible instead of something hidden, what about reify_provenance()?

(I also like how it "tells the story" of how things happened with provenance: provenance is something real but we've been missing it for years, right? From what I understand of @RalfJung's work and posts, the whole point is to actually reify provenance into something we manipulate care about.)

AFAIK "reify" is not widely used in APIs today, so this word a good candidate for carrying some kind of new meaning. Today I would understand it along the lines of "make/transform \ into \".

ChayimFriedman2 commented 2 years ago

But we don't make it more visible (or "real") than what it was, we just copy the provenance. of a pointer to a new pointer. So with_provenance() would be good if the method was on usize and taking a pointer, but because it's the opposite with_addr() is an appropriate name.

khionu commented 2 years ago

Echoing the earlier request, let's try to keep bikeshedding to a venue more suitable for the faster flow of conversation, thank you.

RalfJung commented 2 years ago

All right, here's the Zulip topic for that.

lukas-code commented 2 years ago

How would strict provenance interact with paging/virtual addressing, where multiple virtual addresses can potentially point to the same object? Consider the following example:

#![feature(strict_provenance)]

fn get_physical_address<T>(ptr: *const T) -> usize {
    unimplemented!()
}

fn virtual_address_map(virtual_address: usize, physical_address: usize) {
    unimplemented!()
}

#[repr(align(4096))]
struct PageAligned {
    some_data: u32,
}

fn main() {
    let my_data = PageAligned { some_data: 42 };

    let data_ptr = &my_data as *const PageAligned;
    // data_ptr has a provenance of "my_data"

    let virtual_address = 0x8000_0000;
    let physical_address = get_physical_address(data_ptr);

    virtual_address_map(virtual_address, physical_address); // create an alias mapping
    // data_ptr has a provenance of "my_data" + 0x8000_0000..0x8000_0400 (?)

    let data_ptr2 = data_ptr.with_addr(virtual_address); // keep provenance and set new address

    let my_value = unsafe { (*data_ptr2).some_data }; // is this UB?
}

RalfJung commented 2 years ago

I think we first need to have a story for how such memory shenanigans interact with the Abstract Machine at all, before any specifics can be answered.^^

If this only happens for "external" memory (think: mmap and friends), I am not very concerned. But doing this for stack-allocated variables (which the Abstract Machine makes strong assumptions about) seems much more potentially problematic.

A1-Triard commented 2 years ago

How strict provenance would work with hardware-fixed addresses? Like in bare-metal environment, where I know logical to physical addresses mapping and want to store something at some specific address?

RalfJung commented 2 years ago

Good question; this was discussed somewhere but I forgot where. I made a new issue for it: https://github.com/rust-lang/rust/issues/98593.

coolreader18 commented 1 year ago

I see there's been some discussion of expose_addr for FFI, but I'm interested whether people think functionality like expose_addr/from_exposed_addr but for ffi purposes would be useful. Like, ptr.expose_for_ffi() /* or something */ -> usize affects the provenance the same way it's affected when passing the pointer to an extern "C" function. And vice-versa, where ptr::from_ffi_addr(usize) (or something) constructs a pointer with the same provenance as calling a extern "C" fn() -> *mut () would return.

My specific case is that I'm looking to ensure wit-bindgen-generated code complies with strict provenance, and given that wasm abis treat pointers as literally just i32 offsets into memory, that requires retrofitting a separate Ptr type to everything; the easiest thing to do would be to just say "all this stuff just has ffi provenance" since provenance at the ffi boundary is loose anyway. That's sorta niche, but I'm sure there's other ffi/bindgen-related use cases that could benefit from a similar thing.

RalfJung commented 1 year ago

Provenance is not affected in the slightest when being passed to an extern "C" function.

Lokathor commented 1 year ago

does the compiler have to assume that an unknown function exposes the pointer?

RalfJung commented 1 year ago

Yeah of course. An unknown function could perform any non-UB sequence of operations, in particular calling expose_addr. This is not specific to expose though, it is a general principle. There is no specific interaction of unknown code with provenance/exposure.

I assume this confusion arises because people confuse the concept of explicit provenance in the op.sem with the concept of a compiler analysis that defines an "emergent" notion of whether a pointer has been captured. But those are categorically different concepts. Capture analysis is justified by refering to the op.sem, but does not change the op.sem so does not need to be considered by unsafe code authors. Provenance the way Rust has it exists "by fiat", it is part of the definition of the op.sem.

lasiotus commented 1 year ago

Hello!

How do I deal now with "Hardcoded MMIO address stuff"?

My use case: a custom OS implements Stdio by "hardcoding MMIO address stuff". So a process does stdin/stdout initialization and read/writes by accessing these hardcoded addresses. As stdin/stdout are const-initialized (see const fn stdin_raw() -> StdinRaw in https://doc.rust-lang.org/stable/src/std/io/stdio.rs.html#67), there are int2ptr casts in const context, so compiling rust stdlib fails:

error[E0080]: could not evaluate static initializer
   --> /data/frost-dev/rust/library/core/src/ptr/non_null.rs:385:18
    |
385 |         unsafe { &*self.as_ptr() }
    |                  ^^^^^^^^^^^^^^^ dereferencing pointer failed: 0x210000000000[noalloc] is a dangling pointer (it has no provenance)
    |
note: inside `NonNull::<ProcessData>::as_ref::<'_>`
   --> /data/frost-dev/rust/library/core/src/ptr/non_null.rs:385:18
    |
385 |         unsafe { &*self.as_ptr() }
    |                  ^^^^^^^^^^^^^^^

I tried stuff like ptr::invalid(), but it does not help (same error). Is there a way to mark an address as "I promise this is a good address"? Or to add "known hardcoded MMIO addresses" somewhere so that the compiler does not complain on known addresses?

Edit: I can do what I want in a non-const context:

    let addr = HARDCODED_ADDR as usize;
    let ptr: *const ProcessData = core::ptr::null::<ProcessData>().with_addr(addr);
    ptr.as_ref().unwrap()

but not during const-initialization, as with_addr() is non-const.

Lokathor commented 1 year ago

https://doc.rust-lang.org/std/ptr/fn.from_exposed_addr.html would do what you want except it's not const fn.

bjorn3 commented 1 year ago

What about not storing a pointer inside StdinRaw, but instead creating it on the fly in the Read implementation? Depending on the architecture that may even compile to more efficient code due to the address being known at compile time.

RalfJung commented 1 year ago

@lasiotus that sounds less like a strict provenance issue and more like a const eval question. It looks like you are trying to dereference memory that simply does not exist at compile-time, which cannot work. Or you are creating a reference to such dangling memory -- you should only be holding raw pointers. Please open a new issue, and add example code that demonstrates the problem.

I think you are running into basically the same problem as https://github.com/rust-lang/rust/issues/63197.

lasiotus commented 1 year ago

@lasiotus that sounds less like a strict provenance issue and more like a const eval question. It looks like you are trying to dereference memory that simply does not exist at compile-time, which cannot work. Or you are creating a reference to such dangling memory -- you should only be holding raw pointers. Please open a new issue, and add example code that demonstrates the problem.

I think you are running into basically the same problem as #63197.

Well, the compiler flags converting a raw pointer to a reference, not dereferencing the pointer or the reference.

lasiotus commented 1 year ago

What about not storing a pointer inside StdinRaw, but instead creating it on the fly in the Read implementation? Depending on the architecture that may even compile to more efficient code due to the address being known at compile time.

Yes, I'll have to either construct/drop stuff on each access, or have an Option<> and construct stuff on first access. Not sure why stdin/out/err are const-initialized in std.

RalfJung commented 1 year ago

* means the pointer is dereferenced, creating a place. &* means the pointer is dereferenced and then the place this creates is converted to a reference. No actual memory access happens, but the pointer dereference happens nevertheless. Every dereferenced pointer must point to allocated memory, and every pointer dereferenced at compile-time must currently point to compile-time allocated memory. https://github.com/rust-lang/rust/issues/63197 is about lifting the latter restriction for cases exactly like yours, where memory is runtime-dereferenceable but not compile-time dereferenceable. This is a non-trivial change though, and it has nothing to do with strict provenance, so please don't continue this discussion in this issue.

(Don't worry about having posted to the wrong issue, that happens with subtle issues like this one. Just please don't continue doing it. :)

safinaskar commented 1 year ago

@brooksdavis

In CHERI C we added

So you are making CHERI? Great. Do CHERI CPUs has same performance characteristics (speed, power consumption) as their non-CHERI counterparts? Say, if we compare Morello CHERI Armv8.2-A ( https://www.arm.com/company/news/2022/01/morello-research-program-hits-major-milestone-with-hardware-now-available-for-testing ) with normal Armv8.2-A? Or CHERI-enabled CPU are slightly less performant?

brooksdavis commented 1 year ago

@brooksdavis

In CHERI C we added

So you are making CHERI? Great. Do CHERI CPUs has same performance characteristics (speed, power consumption) as their non-CHERI counterparts? Say, if we compare Morello CHERI Armv8.2-A ( https://www.arm.com/company/news/2022/01/morello-research-program-hits-major-milestone-with-hardware-now-available-for-testing ) with normal Armv8.2-A? Or CHERI-enabled CPU are slightly less performant?

We'll be publishing a detailed benchmarking guide for the Morello prototype implementation in the near future. There are inherent overheads from things like making pointers 128-bit and overheads due to the rapid implementation of Morello.

This issue is not generally the place to discuss CHERI details, I'd suggest joining the CHERI Slack if you have questions. https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-slack.html

Noratrieb commented 1 year ago

I'd like to talk about the naming of these functions. Currently, we have addr and expose_addr, where expose_addr exposes and addr doesn't. With this naming, addr is the "default" and expose_addr is the "special case". I don't think this is good.

Ideally, everyone writing unsafe code has a good understanding of provenance and its implications and is always able to chose the right function. But I don't think this is a realistic goal. Instead, we should always attempt to make unsafe APIs as misuse resistent as possible to ensure that people don't hold them wrong (them being unsafe of course means that we cannot fully ensure it and can only do so on a best-effort basis).

Therefore, I think it's good for unsafe to have safe defaults (or at least don't have unsafe defaults). expose_addr is a safer default than addr. If someone who is not intrinsically familiar with provenance is writing pointer crimes (as one does) then it's possible that they will use addr while intending to expose, introducing UB to the code (if they're writing pointer crimes without good unsafe knowledge then they are probably writing UB in other places as well, but that shouldn't stop us from trying to help them here).

The current addr/expose_addr naming therefore imposes an unsafe default, which I believe is a bad idea. It must be noted that it's at least a portable default (CHERI) which makes me inclined to believe that we should have no default here (arguably as is a default but that will hopefully stay expose).

I suggest that we rename addr to something else. Given the already written literature on strict provenance it's probably a bad idea to rename expose_addr to addr. I think the best solution is for the addr name to not exist. I'm not really sure on alternatives. Maybe something like weak_addr (which indicates extra unsafety)? This will make sure that people using these functions are at least roughly aware of what they're doing and I'm confident that we can give them an often applying quick answer in the documentation.

Firstyear commented 1 year ago

I think you make a good point here that expose_addr is a better default here so I think it would be good for it to become addr. The current function of addr could be untracked_addr instead as a name if that would help indicate it's losing it's provenance information.

coolreader18 commented 1 year ago

The idea is that you should be using expose_addr (and from_exposed_addr) as little as possible, because it's not actually necessary. Unsafe rust isn't something you can just jump into without understanding how things work, and if you're looking to use expose_addr to smuggle pointers through usizes everywhere, you're probably not actually reading the documentation in the first place. expose_addr's name is longer and more specific because it's actually a more "unsafe" operation than addr (in terms of strict provenance, at least), and the unsoundness would be in casting back your usize to a pointer (specifically, calling from_exposed_pointer without reading the documentation that says only call this with something returned by expose_addr). Yes, the standard library should seek to minimize possible footguns like the from_exposed_addr(ptr.addr()) you're describing, but I'd call an optimization barrier simply from inspecting a pointer's address a footgun too, in a different way.

Lokathor commented 1 year ago

Unsafe rust isn't something you can just jump into without understanding how things work, and if you're looking to use expose_addr to smuggle pointers through usizes everywhere, you're probably not actually reading the documentation in the first place.

Yes, you're right, which is exactly why the short name method being the one that requires you read more carefully is not a good design.

An accidental optimization barrier is no major footgun compared ro accidental UB.

Noratrieb commented 1 year ago

If we want to make people read the docs, then we should, as I proposed, have no default at all. This will force more people to go to the docs, where they will learn more about the operation.

Noratrieb commented 1 year ago

I just found a good argument against it. The only way for addr to cause UB is in combination with an int2ptr cast. The int2ptr functions are lovingly named. invalid Is obviously wrong for a user, so from_exposed_addr will be used. The name already mentions that "something is up" anf going to the docs will reveal that you need expose_addr. This way, the safety of defaults for ptr2int don't really matter, because int2ptr will make sure you've understood it. I think this makes the current names fine.

niluxv commented 1 year ago

As I understand it, the whole goal of the strict provenance proposal is to move away from the current expose_addr + from_exposed_addr (which is what is happening when you use as casts) to the more intuitive and portable addr + with_addr. expose_addr + from_exposed_addr are mostly there for backwards compatibility, but new code should always use the strict provenance APIs. IIUC expose_addr + from_exposed_addr is also plagued by miscompiles due to incorrect LLVM optimisations (which are not easily fixed).

Also, currently the naming has a nice symmetry indicating that expose_addr + from_exposed_addr should be used together and addr + with_addr should be used together. If addr were to be given a longer name, to keep this symmetry with_addr would also need a more cumbersome name, which is I think an entirely wrong direction, as with_addr is the safer and more portable way to create pointers.

EDIT: what I called from_addr is actually named with_addr, thanks @adamreichold

adamreichold commented 1 year ago

Note that the counterpart to from_exposed_addr is with_addr or even map_addr. Meaning the difference is using from_exposed_addr to produce a pointer "out of thin air" versus using existing_pointer.with_addr(new_address) to inherit the provenance of existing_pointer.

Personally, I actually like the current naming scheme, especially since

The only way for addr to cause UB is in combination with an int2ptr cast.

which should be helpful as int2ptr casts could become hard errors eventually if the strict provenance model is adopted.

digama0 commented 1 year ago

int2ptr casts could become hard errors eventually if the strict provenance model is adopted.

Regarding this point specifically: this is never going to happen in the foreseeable future, as strict provenance is not actually a model under consideration for the rust execution semantics. It is rather an over-conservative "teaching model" which has side benefits for Miri modeling which make it an attractive model for programmers to use. See https://gankra.github.io/blah/tower-of-weakenings/ : strict provenance is (a candidate for) the "'Clean' memory model" while stacked borrows is (a candidate for) the "'Real' memory model" mentioned in that post.

digama0 commented 1 year ago

I just found a good argument against it. The only way for addr to cause UB is in combination with an int2ptr cast.

Technically, you need three things to cause UB: addr, from_exposed_addr, and then *p on the result. From a strict reading of "the function is safe if it is impossible to cause UB in the operation or using the results in further safe code", the only one of these three steps that needs to be unsafe is *p, but from_exposed_addr is definitely the "suspicious" link in this chain. I don't think we can make it unsafe though because it's just an alternative spelling for usize as *const T and this is already stably safe.

jrtc27 commented 1 year ago

int2ptr casts could become hard errors eventually if the strict provenance model is adopted.

Regarding this point specifically: this is never going to happen in the foreseeable future, as strict provenance is not actually a model under consideration for the rust execution semantics. It is rather an over-conservative "teaching model" which has side benefits for Miri modeling which make it an attractive model for programmers to use. See https://gankra.github.io/blah/tower-of-weakenings/ : strict provenance is (a candidate for) the "'Clean' memory model" while stacked borrows is (a candidate for) the "'Real' memory model" mentioned in that post.

It's a requirement to support CHERI. Whether or not you impose it on other architectures is up to you, but Rust+CHERI means strict provenance, no choice about it. (And no, I'm not going to entertain debates about magic int2ptr thin air provenance, that is explicitly not something we want to support, it goes against the whole principle of memory safety)

digama0 commented 1 year ago

CHERI support in Rust is still a big open question, so I can't say anything for sure about how it will work. Perhaps from_exposed_addr is cfg'd out, or the spec on it says that it is equivalent to invalid on CHERI, I don't know. It's quite reasonable to say that CHERI will not support from_exposed_addr, and for programmers it is mostly a matter of "just don't use it", but for the spec we can't really ignore it or make all arches CHERI-like without breaking a lot of things and violating our backward compatibility promises.

The important role that strict_provenance API plays is by making most code CHERI-compatible, which may open the doors for being able to reduce support for the CHERI-incompatible parts without breaking the world in the future. But we are clearly nowhere near that point today.

adamreichold commented 1 year ago

Personally, I was only concerned with the surface language, i.e. replacing pointer to integer cast using as by using either with_addr or from_exposed_addr to enforce thinking about provenance. I think this should eventually be possible with an edition change without effects on the semantics of the underlying abstract machine? Just that they will need to be implemented using intrinsics instead of the current strict_provenance_magic?

digama0 commented 1 year ago

I believe it is already possible to express the semantics of with_addr and from_exposed_addr using the stable language (see the strict provenance polyfill crate). We could make usize as *const T a hard error on a future edition and force users to use one of those functions, but you would still be able to use as via old-edition inlined code and macros in perpetuity, and it would be interpreted as from_exposed_addr as it is today. AFAIK implementing the strict provenance functions using intrinsics is exclusively an optimization question, it can be done without any language semantic changes.

adamreichold commented 1 year ago

AFAIK implementing the strict provenance functions using intrinsics is exclusively an optimization question, it can be done without any language semantic changes.

I think there is the additional issue that when the current edition outlaws integer to pointer casts using as and the standard library itself is implemented using this edition, it needs to something else to implement e.g. from_exposed_addr. It could continue building exactly that code using an older edition, but that seems a bit complicated especially if there are other benefits to using intrinsics.

RalfJung commented 1 year ago

strict provenance is (a candidate for) the "'Clean' memory model" while stacked borrows is (a candidate for) the "'Real' memory model" mentioned in that post.

FWIW aliasing (Stacked Borrows) and int2ptr casts are somewhat orthogonal issues. So it's not correct to say that Stacked Borrows is the "real thing" behind strict provenance. Stacked Borrows can be combined with strict provenance, or it can be combined with a more complete understanding of ptr2int2ptr casts.

digama0 commented 1 year ago

Fair enough. Unfortunately we don't really have a name for our candidate operational semantics (MiniRust + TB?) so I'm not really sure what to substitute into that sentence to make it correct without being a tautology.

programmerjake commented 1 year ago

I suggest that we rename addr to something else.

naming suggestion: addr_lossy -- it's nice and short and makes the reader go "woah, something unusual is happening here" so they don't think it's just C's (uintptr_t)ptr with a different name and actually read the docs to see what's being lost -- the provenance needed for doing int -> ptr again later.

ketsuban commented 10 months ago

I think it would be a good idea to preemptively mark core::ptr::from_exposed_addr{,_mut} (and maybe core::ptr::invalid{,_mut}) as const, so they can be drop-in replacements for addr as *const T and addr as *mut T.

WaffleLapkin commented 10 months ago

@ketsuban invalid is already const and from_exposed_addr can't be const (const eval doesn't allow exposing pointers).

digama0 commented 10 months ago

I'm not sure that's true, since const eval doesn't allow dereferencing pointers, so whether it is exposed or not shouldn't matter. It should just be able to treat from_exposed_addr as a no-op.

RalfJung commented 10 months ago

I don't think from_exposed_addr makes any sense during const, which set of exposed pointers should it pick from?

digama0 commented 10 months ago

It would pick from the set of exposed pointers...? I think the better question is what makes a pointer exposed, and we already basically have an answer to that (expose_addr and casts). In other words we really don't need to do anything different here in order to have exactly the same behavior as at runtime. Everything is trivial because you can't dereference raw pointers, no matter what provenance they happen to have.

rust-lang / rust