rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.78k stars 1.55k forks source link

RFC: Add `freeze` intrinsic and related library functions #3605

Open chorman0773 opened 3 months ago

chorman0773 commented 3 months ago

Rendered

chorman0773 commented 3 months ago

@rustbot label +T-lang +T-opsem +T-libs-api

RustyYato commented 3 months ago

~~MaybeUninit::freeze should be unsafe. Otherwise you could get MaybeUninit::<AnyType>::uninit().freeze() which is trivially unsound.~~

Thinking about this so late was a bad idea, sorry for the noise. I misread Self as T

kennytm commented 3 months ago

MaybeUninit<T>::freeze() returns a MaybeUninit<T> not a T, why would that be unsound

chorman0773 commented 3 months ago

I also do explicitly address an unsafe T returning MaybeUninit::freeze in the rationale and alternatives section and explain why I went for the safe version instead.

clarfonthey commented 3 months ago

Small thing to point out: although it's not exposed publicly, the compiler uses the term "freeze" to refer to things without interior mutability, and that makes the naming of this potentially confusing for those familiar with the term. That said, the existing "freeze" term could always be renamed to something else since it's not stable, but it's worth mentioning anyway.

One thing worth asking here is how this specifically differs from volatile reads, since I can see a lot of similarities between them, and while I understand the differences, a passing viewer might not, and that's worth elaborating a bit more.

RalfJung commented 2 months ago

Cc @rust-lang/opsem

JarredAllen commented 2 months ago

An extra function related to this that I've often found myself wishing existed (name unimportant and can be changed):

impl<const N: usize> [u8; N] {
    pub fn new_freeze() -> Self {
        unsafe { MaybeUninit::<Self>::new().freeze().assume_init() }
    }
}

I mostly find this useful for functions on std::io::Read which require an initialized &mut [u8], for which there's no reason to write a value just for it to be overwritten.

It's not that important, since it's easy to implement on top of the functions provided here, but I think a safe function in the standard library for this case would be nice.

ijackson commented 2 months ago

To be clear, I think one property of this freeze primitive is that the values of the frozen bytes might be influenced by, or be copies of, any other value anywhere in the whole program's address space (past or present or future).

This behaviour is extremely hazardous in a program which has anything in its address space that oughtn't to be leaked out in its behaviour. Worst case seems to be that the frozen bytes might be crypto keys or something.

At the very least this needs to be discussed in the RFC. But I fear that the semantics are sufficiently dangerous that these functions ought not to be used except in very narrow cases, and with very careful thought.

Also, I worry about the interaction with Spectre.

zachs18 commented 2 months ago

@ijackson

At the very least this needs to be discussed in the RFC. But I fear that the semantics are sufficiently dangerous that these functions ought not to be used except in very narrow cases, and with very careful thought.

This is discussed in the "Drawbacks" section.

quote The main drawbacks that have been identified so far: * It is potentially [considered desireable](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Arguments.20for.20freeze/near/377333420) to maintain the property that sound (e.g. correct) code cannot meaningfully read uninitialized memory * It is generally noted that safe/unsafe is not a security or privilege boundary, and it's fully possible to write unsound code (either deliberately or inadvertanly) that performs the read. If the use of uninitialized memory is within the threat model of a library that, for example, handles cryptographic secrets, that library should take additional steps to santize memory that contains those secrets. * Undefined behaviour does not prevent malicious code from accessing any memory it physically can. * That said, making such code UB could still be useful as it makes it unambiguously a bug to expose the contents of uninitialized memory to safe code, which can avoid accidental information leaks. If this RFC gets accepted, we should find ways to make it clear that even if doing so would be technically *sound*, this is still not something that Rust libraries are "supposed" to do and it must always be explicitly called out in the documentation.
RalfJung commented 2 months ago

To be clear, I think one property of this freeze primitive is that the values of the frozen bytes might be influenced by, or be copies of, any other value anywhere in the whole program's address space (past or present or future).

Yes indeed. This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

ijackson commented 2 months ago

@RalfJung

This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

If we add something like this to Rust it should come with gigantic warnings. Just making it unsafe isn't enough, because the implications are far-reaching - and will be very surprising to most people who haven't had time to think about it properly.

@zachs18

This is discussed in the "Drawbacks" section.

I think this is nowhere near explicit enough. Indeed, it's over-optimistic. The text there says

If the use of uninitialized memory is within the threat model of a library that, for example, handles cryptographic secrets, that library should take additional steps to santize memory that contains those secrets.

But there is almost nothing that that other library can do that will prevent this threat. In principle, some other code using freeze might arbitrarily happen to read those cryptographic keys while they are in use (eg by another thread), or are still live and ready for use.

I.e. this RFC text is somehow placing the blame for this situation on the author of the other library. But it's not their fault. It's the fault of freeze, which inherently has this defect.

And the drawback section doesn't mention Spectre. It should probably say something like this:

Speculation hazards

On modern hardware, freeze might have very surprising and harmful effects in combination with speculative execution on out-of-order CPUs.

The nature and extent of this risk is almost completely unknown, but it might well include giving an attacker the ability to extract security-critical data from a program they're merely interacting with.

Use of this feature should be avoided in programs or systems any part of which might encounter any hostile input ,or deal with untrusted entities.

With an imprecation like that (and presumably RUSTSEC advisories any time anyone uses this), is it still useful?

chorman0773 commented 2 months ago

Yes indeed. This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

Or some other fixed constant if that is somehow worse than either.

zachs18 commented 2 months ago

@ijackson

In principle, some other code using freeze might arbitrarily happen to read those cryptographic keys while they are in use (eg by another thread), or are still live and ready for use.

I think there may be a misunderstanding as to what is being proposed by this RFC. The freeze intrinsic (as proposed) does not affect aliasing guarantees in any way. If some initialized (secret) data is accessible to another thread, then either that thread can already soundly access that data today due to appropriate synchronization, or such accesses would be UB regardless of whether such accesses use read or read_freeze.

Also, it was mentioned in the "Drawbacks" section but it bears repeating that "Undefined behaviour does not prevent malicious code from accessing any memory it physically can". This RFC does not add any new physical operations to actual computers; code can already be written that does what is proposed "on the metal", such code is just not possible to write in (correct) Rust (at least without platform-specific operations).

And the drawback section doesn't mention Spectre.

I don't see how Spectre is related to the proposed feature. Do you have a somewhat-concrete example of a situation in which you think speculative execution would interact with the proposed feature?

chorman0773 commented 2 months ago

I think Spectre can interact with it insofar as any untrusted input value can - in the same way the frozen value could be a stale crypto key, it could also be stale user input. I'm not sure that's worth mentioning specially in the RFC, though.

comex commented 2 months ago

It's not like Spectre affects the contents of uninitialized memory or depends on reading uninitialized memory. In fact, I can't think of much connection between Spectre and uninitialized memory.

They both involve 'weird machines', and state that's not meant to be revealed but potentially can be anyway. But it's different state. For uninitialized memory it's the architectural-level contents of memory that software is supposed to guard from disclosure. For Spectre, it's microarchitectural state that hardware is supposed to guard from disclosure.

The hardware never lets that microarchitectural state leak into the actual results of loads from memory, even if that memory is 'uninitialized' from a software perspective. The leaks are only via timing.

Or even if loading uninitialized memory is dangerous somehow, Rust code already does that all the time in the form of padding bytes.

That said, Spectre aside, it's absolutely true that it's dangerous to ever disclose the contents of frozen uninitialized memory. Which means that there are not many valid use cases for freeze. But 'not many' is not 'none'.

chorman0773 commented 2 months ago

I can agree with a documentation-level warning along the lines of "bytes obtained by freezing uninitialized memory can be previous contents from old allocations that happen to overlap, which may include cryptographic secrets. It is generally insecure to transmit frozen bytes to an untrusted 3rd party, such as over a network, unless steps are taken to prevent accidental or malicious exfiltration of secure data"

ijackson commented 2 months ago

bytes obtained by freezing uninitialized memory can be previous contents from old allocations that happen to overlap,

I think we have a real disagreement about the semantics here. Based on my understanding of the (non)-guarantees offered by freeze, I think the above text radically understates the problem.

My understanding of the semantics is, as @RalfJung puts it:

And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

So specifically, the bytes you get out of freeze are not limited to previous contents of now-released allocations (whether on the stack or the heap). They might be from anywhere, including arbitrary other parts of memory. So, indeed, they might be magically transported from memory being concurrently accessed by other threads. They might come from live mutable borrows that you couldn't deliberately access without UB. etc.

Spectre makes all this more likely and/or worse (in a way that's super confusing to reason about).

If my understanding of the semantics are wrong, then there are some guarantees. But what are they? The documentation needs to clearly state those guarantees, as positive statements, in a way that means that the compiler authors (including LLVM experts) can verify that what we're promising is true.

If my understanding is right, then these functions are extremely dangerous and that needs to be stated much more alarmingly in the docs. Giving mild examples of what might happen is very misleading, when the scope of possible behaviiours is so wide, and the worst cases so much worse than the examples.

chorman0773 commented 2 months ago

So specifically, the bytes you get out of freeze are not limited to previous contents of now-released allocations (whether on the stack or the heap). They might be from anywhere, including arbitrary other parts of memory. So, indeed, they might be magically transported from memory being concurrently accessed by other threads. They might come from live mutable borrows that you couldn't deliberately access without UB. etc.

They can come from any of these places, but I'd expect one of the following from any non-malicious compiler:

I'd generally not expect a well-behaved compiler to yank memory from elsewhere in the program. While it theoretically could, I'm not sure there's a security reason to mention more than those cases that isn't caused by "Your compiler sucks and you should file a QoI issue to their repo".

It's up there with the stories we tell people about compilers formatting your hard drive on UB - they're within their rights to do so, and you shouldn't be writing UB-causing code, but if the compiler actually invented a call to Command::new("mkfs.ext4").arg("/dev/sda1").spawn(), I'd be within my rights to file an issue and never use that compiler again.

programmerjake commented 2 months ago
  • Trap, because you read as a type that doesn't allow all initialized byte sequences (like bool) and the compiler helpfully chose a byte sequence the type doesn't allow.

that one is just plain UB, so the compiler could reasonably delete all code that must reach that point in the program, just like unreachable_unchecked.

e.g.: https://llvm.godbolt.org/z/qMqcMbzr4

chorman0773 commented 2 months ago

Yeah, that is indeed just a misspelling of core::hint::unreachable_unchecked(), and has the same implications.

oskgo commented 2 months ago

It is possibly to use freeze to build a sound and safe API that can expose secrets. I think this may be an issue because programmers writing safe Rust today can assume this doesn't happen (barring bugs in unsafe code).

Currently the easiest way to avoid leaking any sensitive data is to not have access to it. With the addition of freeze they also have to make sure they're not using an API that outputs raw frozen data.

One could imagine an API for zero copy serialization that uses freeze to serialize padded data. This would be fine to use for local storage or transfer between nodes of equal trust, but could be disastrous if used to send data between computers with different privileges.

Diggsey commented 2 months ago

It is possibly to use freeze to build a sound and safe API that can expose secrets.

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

Rust is a systems programming language, that means (and rightly so...) that you can use unsafe code to do just about anything that is possible.

There is a related question which is worth answering though: should it be acceptable for a safe API to expose "frozen" data to consumers of that API, and the answer to that is probably "no", since reading uninitialized memory could be considered not memory safe. This is purely a documentation question though, it doesn't affect the API proposed here, since this API offers no way to safely get at the uninitialized data.

RalfJung commented 2 months ago

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

There is no existing way to do that without causing UB. So I don't think "there are any number of ways" is accurate. We have a very well-established norm that UB is not okay and we have tools to help us detect UB. So the risk that a library will leak contents of unrelated memory right now is fairly contained.

This RFC makes that risk a lot bigger by making it possible to write sound code that passes all sanitizers and yet still leaks memory contents. The social norm of "though shall not cause UB" is no longer sufficient to defend against these kinds of leaks.

So I agree that every function that leaks the result of a freeze should have a very clear warning that it does so, probably linking to some central docs in the standard library that explain the background. The key point here is to be actionable, so e.g. make it clear how that data should be treated. Even with the point added in drawbacks, this point does not get enough attention in the RFC. For instance, the guide-level explanation should give guidance on how to manage and contain the risk of data leakage.

However, I fail to see any connection to Spectre. With Spectre one can already have leaks via timing in existing entirely sound code; I don't see how things would become any worse with freeze, nor which actionable advice to give to reduce the risk. Spectre is a hardware bug and there's little that programming languages can do here; if there is some sort of Spectre mitigation that can be applied in Rust then that discussion seems largely unrelated to this RFC.

digama0 commented 2 months ago

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

There is no existing way to do that without causing UB. So I don't think "there are any number of ways" is accurate. We have a very well-established norm that UB is not okay and we have tools to help us detect UB. So the risk that a library will leak contents of unrelated memory right now is fairly contained.

There are certainly many ways to have a safe API that exposes secrets. You can just fail to sanitize your arguments appropriately, have a backdoor or malware, or loads of other things. unsafe is not a security boundary. Rust protects against memory unsafety with the unsafe keyword, but there are many ways safe code can be insecure without having UB or memory unsafety in them. Passing nondeterministic information (or just regular secrets) to untrusted parties is an example of insecure behavior, not unsafe behavior.

oskgo commented 2 months ago

@digama0 Not just any secrets, but secrets you don't know about and can't currently access with safe code.

Rust user should know that UB, or more generally unsoundness, is a security hazard no matter where it occurs. We have strong norms about this. Even though the safe/unsafe boundary is not a security boundary we expect more care when using unsafe and treat bugs in unsafe code as much more critical. There's even a language feature to help make the distinction. We could instead have gone with the Swift route and made unsafety only be mentioned in documentation and naming conventions.

Obviously there's a reason for this. UB is (at least in theory) at least as bad as calling into malware. In practice we care about this because UB can cause data corruption, allow remote code execution and give access to any secrets on the machine not protected by a higher privilege than the program.

With freeze we poke a hole in this. Now safe code has to worry about this as well, not just their own secrets, which if you structure your code well will often be none.

Consider the zero copy serialization library I mentioned earlier. In my view that's a perfectly legitimate use of freeze. We can hope that we'll have just as strong norms that frozen values should never be exposed, but I'm doubtful. If such a library is accepted and gets to expose a safe API then misuse of the API is very dangerous with nonlocal effects similar to UB, even if they document that well. Imagine using it on config stored locally at first and later adding a feature to share that config with others, for example.

digama0 commented 2 months ago

This is not the first use of nondeterminism in Rust by a long shot. The allocator is nondeterministic, so for all you know it is encoding all your secrets into the pointer values, and if you ever use {:p} printing you might be exfiltrating something without realizing. I don't think there is any reason to be any more alarmist about this nondeterminism than any other kind of nondeterminism in the opsem.

RalfJung commented 2 months ago

It is much more likely that secrets will be leaked by dumping the contents of uninit memory (via freeze) than via the addresses of the pointers returned by the allocator. Your claim that data could be exfiltrated by the allocator is rather far-fetched -- can you even construct an example where that actually could realistically happen?

We're not being "alarmist" but we are raising valid concerns that the RFC should address, and that are quantitatively if not qualitatively different from what exists in Rust today.

RalfJung commented 2 months ago

This is not at all a new concern either, see e.g. https://github.com/rust-lang/rfcs/pull/837. The new RFC should probably reference that old proposal and engage with the arguments raised there.

oskgo commented 2 months ago

It's already been mentioned that a non-malicious compiler should do one of four things when freezing unit data, and two of those include getting stale data.

I don't expect a non-malicious allocator and compiler pair to choose adresses based on secrets.

I would be much less worried if a secure version of this such as initialize existed already, and we could put in bold on top of the documentation for freeze something like: "99% of use cases should use initialize instead`, especially if you're possibly exposing frozen values in safe APIs"

VitWW commented 2 months ago

1) Freeze function is effective tool to prevent double initializing (for example, for custom allocators). 2) Freeze representation does not UB for types with "total" representation, when any value of a memory is a valid value of a given type. As I know, not all types are safe to have freeze representation. But some types definitely are "freezeable", like numbers, arrays of "freezeable" types.

ryanavella commented 2 months ago

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

There is no existing way to do that without causing UB.

I have an application which does this intentionally by separately compiling and linking asm. No lto of course.

I assumed this "hack" is sound because the reads happen entirely outside of the Rust abstract machine, i.e. in the assembly I'm abiding by an entirely different abstract machine's rules. Are you suggesting otherwise?

chorman0773 commented 2 months ago

FFI calls cannot perform operations on the Rust AM's state that don't correspond to valid AM operations (for example, it can't write through a Frozen or Disabled tag). That being said, there is indeed nothing that prevents the FFI function from pulling the value it returns from Thin Air.

RalfJung commented 2 months ago

I assumed this "hack" is sound because the reads happen entirely outside of the Rust abstract machine, i.e. in the assembly I'm abiding by an entirely different abstract machine's rules. Are you suggesting otherwise?

You can't just freely compose two AMs. Insofar as the AMs access or change each other's state, they can only do that in ways that the other AM would already permit anyway. That is required to ensure soundness of whatever optimizations the compiler performs based on one AM (not knowing anything about the other AM).

So as long as Rust doesn't have freeze, doing it via inline assembly is very questionable at best. Freeze needs to inspect AM state in ways that the AM does not permit ("is this byte initialized"), which IMO makes implementing freeze via inline asm unsound. But I admit I can't construct a miscompilation so maybe this is some sort of barely sound gray area. Given the concerns around leaking secrets, I think it should definitely be treated as something Rust does not intend to be possible, until there was a decision to permit it (i.e., until this RFC gets accepted).

ryanavella commented 2 months ago

(you mentioned inline assembly a few times, and I just wanted to clarify I am talking about externally linked assembly)

From the assembly AM's perspective there is no such thing as uninitialized memory, so lto passes can't just freely miscompile unless if the system linker is somehow aware of the specifics of Rust's AM.

I'd be curious if your position would change if it was primarily assembly calling into Rust?

RalfJung commented 2 months ago

There's not much of a difference between inline asm, linked asm, and linked C (without cross lang inlining).

And there's no such thing as a primary/secondary AM. When you mix code in different languages, the restrictions of all AMs must be honored on the state that the respective AMs can observe.

Diggsey commented 2 months ago

So as long as Rust doesn't have freeze, doing it via inline assembly is very questionable at best. Freeze needs to inspect AM state in ways that the AM does not permit ("is this byte initialized"), which IMO makes implementing freeze via inline asm unsound.

I don't agree - ultimately both AMs must somehow map their states onto the real hardware state. If an operation is sound in one AM and doesn't modify any state that is visible to the other AM, then it doesn't matter what rules that second AM has, the operation cannot be unsound. It's simply outside the jurisdiction of the other AM.

That's the case here - ASM doesn't have a concept of uninitialized memory, so the read operation is sound as far as ASM is concerned. The read operation also doesn't modify any real hardware state that is visible to Rust. Therefore the operation is sound.

RalfJung commented 2 months ago

If an operation is sound in one AM and doesn't modify any state that is visible to the other AM, then it doesn't matter what rules that second AM has, the operation cannot be unsound.

That's not necessarily true. An AM can have properties of indistinguishably, where two different AM states are known to be equivalent for all observers that run inside the AM. If now another AM performs a transition that can only be explained by telling apart these supposedly indistinguishable states, that could lead to unsoundness if the compiler assumes that telling them apart is impossible.

After all, freeze is crucially not a NOP in the Rust AM. It doesn't modify hardware state (leaving aside MADV_FREE for a second), but it does inspect and modify AM state. It is this transition on AM state that must be argued to be sound. This is obviously the case if the transition is possible by regular means inside the AM. But for freeze, that obvious argument does not work. A very non-obvious argument may be possible, but it is a lot more complicated than "the hardware state doesn't change" -- in particular, it necessarily has to involve reasoning about the fact that the AM is "monotone" wrt initialization.

Diggsey commented 2 months ago

that could lead to unsoundness if the compiler assumes that telling them apart is impossible.

Firstly, that's only the case if the ASM returns that information to the part of the program running within the AM. The read alone is still sound regardless of what the AM says.

Secondly, in the case of the Rust AM, it would be non-sensical for a Rust compiler to be allowed to make that assumption, since in practice you can always distinguish hardware states (such as via timing side-channels, OS calls, etc.)

RalfJung commented 2 months ago

Firstly, that's only the case if the ASM returns that information to the part of the program running within the AM. The read alone is still sound regardless of what the AM says.

No. If the compiler considers the states equivalent it will freely change the program from one to the other, making it unsound for any part of the program (in any AM) to observe it.

Secondly, in the case of the Rust AM, it would be non-sensical for a Rust compiler to be allowed to make that assumption, since in practice you can always distinguish hardware states (such as via timing side-channels, OS calls, etc.)

Timing is considered not observable. And you cannot se the hardware state to distinguish AM states that map to the same hardware state.

programmerjake commented 2 months ago

what about the argument that, with an asm read, if it could read uninit then we just know it returned some non-uninit value, we don't know what -- this allows the AM to have whatever undef-or-not shenanigans it likes since those are just different ways of cooking up a unknown value. if it can't read uninit then the asm must return the value actually in memory, so is perfectly defined. afaict this is sound as long as we don't tell the compiler it can freely duplicate the asm and expect the values to be the same.

RalfJung commented 2 months ago

I think whether or not doing this in inline asm can somehow be justified is largely off-topic for this RFC. It is clearly outside the guidance we give for sound asm/FFI (the interaction with AM state, both how it is read and how it is changed, is not expressible with Rust code).

One part of this RFC is deciding whether and how sound code may leak the contents of uninit memory, how to document this, and what the norms and conventions are for such cases. That discussion cannot be bypassed with inline asm arguments nor with any number of "technically".

Fishrock123 commented 4 weeks ago

The link to the rendered document in the OP is a 404 btw

CAD97 commented 4 weeks ago

fixed to link to the renamed file after 5b2ff7c