rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.08k stars 12.79k forks source link

Safe methods in std::io allow reading uninit memory #20314

Closed sivadeilra closed 9 years ago

sivadeilra commented 9 years ago

\src\libstd\io\mod.rs contains two methods that allow attacking code to read memory that it should otherwise not have access to. The methods are Reader.push() and Reader.push_at_least(). An attacker could write (or exploit) an implementation of Reader, by implementing a read() method that reads from the given buffer, rather than writing to it. Or by not writing at all, returning a non-zero byte count, and then calling push() and seeing what memory was returned.

The push() and push_at_least() methods should probably just be deleted entirely. The support function slice_vec_capacity() could also be deleted.

kmcallister commented 9 years ago

Nominating. Regardless of whether this technically violates Rust's memory safety guarantees, it's extremely surprising and has major security impact.

huonw commented 9 years ago

It seems this was already noticed and the functions are suggested for deletion when std::io gets overhauled for 1.0: https://github.com/aturon/rfcs/blob/io-os-reform/text/0000-io-os-reform.md#removed-methods

pythonesque commented 9 years ago

As a side note this feels like what unsafe traits would be useful for.

aturon commented 9 years ago

This was discussed as part of the IO reform RFC process, and it is believed not to lead to undefined behavior, at least if we follow a certain pattern of intrinsics use. Reading an uninitialized but alloced block of u8 data is type and memory safe.

There doesn't seem to be widespread agreement elsewhere (on IRC, etc) as to how serious a security risk this poses. However, note that we can close this potential hole easily by pre-zeroing the memory in these convenience functions, without causing any API breakage.

It would be helpful to see a more detailed example of an exploit based on the current behavior.

sivadeilra commented 9 years ago

I'm frankly astonished. There is no way my team will accept Rust with this kind of information leakage within its "safe" subset.

How can you make any claim about type safety and security, with such a gaping hole?

On Thu, Feb 12, 2015 at 2:07 PM, Aaron Turon notifications@github.com wrote:

This was discussed as part of the IO reform RFC process, and it is believed not to lead to undefined behavior, at least if we follow a certain pattern of intrinsics use. Reading an uninitialized but alloced block of u8 data is type and memory safe.

There doesn't seem to be widespread agreement elsewhere (on IRC, etc) as to how serious a security risk this poses. However, note that we can close this potential hole easily by pre-zeroing the memory in these convenience functions, without causing any API breakage.

It would be helpful to see a more detailed example of an exploit based on the current behavior.

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/20314#issuecomment-74163554.

aturon commented 9 years ago

@sivadeilra

First, I'm sorry if I wasn't clear, but no decision has been made here. I was asking for clarification.

I was trying to clarify first of all that there is not a type or memory safety hole here, see this comment thread. If you believe this can lead to a type or memory safety violation, can you please spell out the details? (cc @mahkoh)

Second, I was hoping you could spell out in more concrete detail an exploit based on this behavior alone. It would require a Reader implementation to read from the buffer it's supposed to write into, afaict.

As I said in my comment, we always have the option of zeroing the memory in these convenience methods, which doesn't require any API change.

kmcallister commented 9 years ago

I'm basically with @sivadeilra here. This needs to be fixed for 1.0.

The exploit seems clear. The memory you read in read is uninitialized, fresh from the allocator. It will contain whatever was there last — passwords, encryption keys (yeah, you should zero these on drop, but...), addresses of internal structures useful for defeating ASLR, etc.

You may not be worried about malicious read impls but even an inept one may leak information. Exploits for systems like kernels or sandboxed browers are very complex. They often combine information leaks from a number of "minor" bugs.

... not lead to undefined behavior, at least if we follow a certain pattern of intrinsics use. Reading an uninitialized but alloced block of u8 data is type and memory safe.

Sure, it's not UB. I'll even accept that it's not a "type and memory safety" issue. But this is a major lurking security risk in a language that otherwise goes out of its way to prevent you from reading uninitialized memory.

aturon commented 9 years ago

@kmcallister

The exploit seems clear. The memory you read in read is uninitialized, fresh from the allocator. It will contain whatever was there last — passwords, encryption keys (yeah, you should zero these on drop, but...), addresses of internal structures useful for defeating ASLR, etc.

But a read implementation doesn't read from the buffer, it writes into it. It would have to be very inept indeed to read from the buffer.

kmcallister commented 9 years ago

But a read implementation doesn't read from the buffer, it writes into it. It would have to be very inept indeed to read from the buffer.

No, it would just have to be a large software system with many components that interact in unanticipated ways. A read implementation could transitively invoke thousands of lines of code...

kmcallister commented 9 years ago

There's no way I should have to write a "realistic" program and exploit to get people to take this seriously. But I'll do it if that's what it takes.

Any read of uninitialized memory in the safe subset of Rust seems 100% not okay. It doesn't matter whether it's technically a "memory safety" issue.

kmcallister commented 9 years ago

It would have to be very inept indeed to read from the buffer.

Argument by lack of imagination :)

"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former."

aturon commented 9 years ago

@kmcallister

There's no way I should have to write a "realistic" program and exploit to get people to take this seriously. But I'll do it if that's what it takes.

I think you and @sivadeilra are reading way too much into my earlier comment: I'm trying to gain clarity here about the issues in concrete terms, because the original claim had to do with memory unsafety/UB, which turned out to be incorrect. I do take this seriously! Which is why I'm trying to understand the details.

Any read of uninitialized memory in the safe subset of Rust seems 100% not okay. It doesn't matter whether it's technically a "memory safety" issue.

This seems like the crux of the issue. There does not seem to be widespread agreement about this point, and I think we need to make a global decision about it.

cc @nikomatsakis @pcwalton

kmcallister commented 9 years ago

Or by not writing at all, returning a non-zero byte count, and then calling push() and seeing what memory was returned.

Yeah, screwing up the byte count seems more likely to happen than deliberately reading in the read impl, and just as bad or worse.

aturon commented 9 years ago

@kmcallister Thanks, that's what I was looking for!

kmcallister commented 9 years ago

Sorry to pile on you @aturon, I know you're just fact-finding. I do have a pretty strong reaction to the "surely nobody would screw up that bad" logic, which is at odds with Rust's philosophy.

aturon commented 9 years ago

OK, after discussion here and elsewhere, I propose we write a short RFC that clearly establishes the following policy, which goes beyond ensuring lack of UB:

Safe APIs in Rust must never expose uninitialized memory, even in cases that do not lead to UB.

@kmcallister Would you be interested in working together on such an RFC?

nikomatsakis commented 9 years ago

I'll be honest, I was not following the full details of this discussion before. In particular, I mistakenly thought the focus was on the read method, which takes in a buffer and writes into it, rather than the read_to_end method. I talked this over a bit with @aturon (and followed @kmcallister's comments on IRC) and my feeling right now is that I agree with @kmcallister, we should uphold the principle that safe APIs do not expose uninitialized bytes. It seems to be a less important guarantee than undefined behavior, but important enough not to sacrifice lightly. Moreover, this problem only even comes up for a narrow range of APIs involving byte buffers; as a consequence those APIs don't feel especially rusty to me. And finally in this particular case, read_to_end is merely a convenience function, so if this becomes a perf bottleneck somewhere, it is easily overcome.

pcwalton commented 9 years ago

I am somewhat conflicted here, but I am happy to go along with this consensus.

aturon commented 9 years ago

I've written an RFC to update our policy about safe code to cover this case.

sivadeilra commented 9 years ago

To me, this hole does constitute a type safety hole. "Type safety" is not just the ability to prevent data structures from being modified outside of your language. It is the ability to preserve the invariants of the language. Put as precisely as I can, "type safety" is the property of a language, in which the language defines its type system / operations / invariants, and that the language controls the mapping from those language semantics to the underlying machine representation. That is why, to me, being able to read the machine state (outside of the language) is just as bad as being able to modify the machine state.

I believe many people have the perspective that conserving "type safety" means only preventing modification of machine state outside of language. I think the last 15 years or so of experience with security, in operating systems, browsers, and everything else, shows that being able to access information (that you're not supposed to access) can be just as dangerous as being able to modify information.

Keegan's point about the composition of large, complex systems is very important. Violations of invariants that may seem minor may become major, when combined with other seemingly-minor exploits, or when an attacker manipulates a "minor" exploit (such as reading dirty memory) in order to read a very juicy piece of information.

It is not important that this involves I/O. What's important is trusted system code (code within an "unsafe" block) is relying on untrusted user code (any implementation of the Reader trait) in order to maintain the invariant of type safety.

It may be possible to have some cake and eat some. It may be possible to have Rust provide efficient unsafe implementations of Reader, which cannot be abused by user code (i.e. have no extension points), while at the same time providing less-efficient extensible implementations which guarantee that the buffer is zeroed.

I would be extremely cautious about trading away a profound safety invariant for a small, quantitative gain in performance.

On Fri, Feb 13, 2015 at 10:18 AM, Keegan McAllister < notifications@github.com> wrote:

But a read implementation doesn't read from the buffer, it writes into it. It would have to be very inept indeed to read from the buffer.

No, it would just have to be a large software system with many components that interact in unanticipated ways. A read implementation could transitively invoke thousands of lines of code...

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/20314#issuecomment-74299941.

huonw commented 9 years ago

I think it's worth noting that this behaviour can be replicated in "user-space" by just reusing a fixed length buffer several times without zeroing in between (e.g. there was a recent blog post that described itself as demonstrating heartbleed in Rust by doing that sort of thing). Of course, it's definitely not as insidious as using uninit memory, since its relatively obvious from the source, while implicitly reusing an old allocation is very subtle and non-local.