rustls / rustls

A modern TLS library in Rust
Other
6.17k stars 644 forks source link

Fallible allocation #886

Open jsha opened 2 years ago

jsha commented 2 years ago

As part of rustls-ffi development, both curl and Apache have reiterated the point that libraries should not panic (or abort) on OOM. Rust 1.57 stabilized fallible allocation for some types:

Rust 1.57 stabilizes try_reserve for Vec, String, HashMap, HashSet, and VecDeque. This API enables callers to fallibly allocate the backing storage for these types.

158 is related, though we don't need full support custom allocators in order to support fallible allocation.

Is fallible allocation something rustls is interested in supporting? What API changes do we need? A quick look:

CommonState::new, of course, allocates a bunch of things: ChunkVecBuffers, a MessageFragmenter, and a RecordLayer. We'd need a CommonState::try_new that could pass through errors from constructing those objects, and we'd need fallible constructors for those objects.

CheckVecBuffer needs to allocate a VecDeque, which now has a stabilized fallible allocation API.

RecordLayer contains two Boxes. Box::try_new is still unstable.

MessageFragmenter doesn't allocate in its constructor. But CommonState::send_msg_encrypt, send_appdata_encrypt, and send_msg allocate temporary VecDeques for MessageFragmenter to use. And MessageFragmenter calls push_back on those VecDeques, which allocates.

ConnectionCommon includes a Box that is built during start_handshake.

HandshakeJoiner contains a VecDeque and a Vec.

ConfigBuilders creates Vecs of cipher suites, KX groups, and TLS versions. It copies these from input slices. Also the final construction of a ClientConfig or a ServerConfig allocates a Vec for alpn_protocols and a session cache.

msgs/handshake.rs has a number of HashMap instantiations to deduplicate things:

    pub fn has_duplicate_extension(&self) -> bool {
        let mut seen = collections::HashSet::new();

This is far from complete but I think the overall gist is:

briansmith commented 2 years ago

As part of rustls-ffi development, both curl and Apache have reiterated the point that libraries should not panic (or abort) on OOM.

For panics, rustls-ffi should catch all panics before they get past its FFI interface using catch_unwind. If/when that is done, then oom=panic (https://github.com/rust-lang/rust/issues/43596) would allow rustls-ffi to catch OOM failures too. Maybe that is good enough for the needs of rustls-ffi and would also have the benefit that no changes would need to be done in Rustls to support it.

Conversely, even if we were to do what you suggest and use fallible allocation APIs pervasively in Rustls, rustls-ffi would probably still need to use catch_unwind pervasively to be sure that no panics get past the FFI interface. With the tools we currently have available, it seems impractical to fully ensure we never panic in Rustls and that no dependency of Rustls ever panics.

Note that Rustls has several plug-in interfaces, e.g. for plugging in custom certificate validators, and the impact on those APIs need to be considered too. Would any of these APIs need to change? How would we document how implementers of the plug-ins are supposed to deal with this requirement?

I also think Rustls exists in large part to help people build Rust replacements for things like Apache web server and Curl. I think we should have a better use case for changing Rustls to use non-fallible allocation than such legacy applications. Which is to say, I think oom=panic is the approach I prefer to supporting them, so that we don't get distracted from improvements that benefit Rust applications.

briansmith commented 2 years ago

Also, what would we do to make maintaining the codebase to ensure that contributors don't add new uses of the "infallible" allocation APIs, and that we don't have to spend time during code review finding and correcting them?

One possible solution:

pub struct Vec<T>(alloc::vec::Vec<T>);
impl<T> Vec<T> {
     // subset of standard `Vec` API we need. No "infallible allocation" allowed
    #[inline]
    inline fn X(....) -> ... { self.0.X(....) }
}
pub struct VecDqeue<T>(alloc::collections::vec_deque::VecDeque<T>);
impl<T> VecDqeue<T> { ... }

I would expect this would significantly reduce the maintenance burden.

jsha commented 2 years ago

rustls-ffi should catch all panics before they get past its FFI interface using catch_unwind

Yes, this is already the case.

With the tools we currently have available, it seems impractical to fully ensure we never panic in Rustls and that no dependency of Rustls ever panics.

Ah, interesting. From https://github.com/rustls/rustls/issues/447 I assumed your opinion was that statically panic-free Rustls (plus dependencies) was an eventually reachable goal.

One of the reasons to avoid panicking is to avoid the possibility of exposing inconsistent internal state. OOM=panic introduces a bunch of new places that have to be reviewed for panic safety - any allocation.

Note that "libraries shouldn't panic or abort" isn't exclusive to C ecosystem. It's also good design for a robust Rust library that deals with untrusted input. By consolidating allocations in a smaller number of places, and making those allocations fallible, we would make rustls a better library for Rust users as well. It would also make memory usage more predictable, which would be useful in particular for embedded systems.

djc commented 2 years ago

By consolidating allocations in a smaller number of places, and making those allocations fallible, we would make rustls a better library for Rust users as well.

I definitely agree with that part. On the other hand, there are definitely trade-offs here; for example, consolidating allocations might mean making the API harder to use and/or allocating extra memory that is not usually needed.

A probably fairly uncontroversial way to get started would be to add an AllocationFailed Error variant and start making allocations infallible on paths that already return a Result.

It's still unclear to me that fallible allocations are the best use of resources if the goal is to minimize panics. There's still enough other ways the code base could panic, both explicitly (notably on the early data path) and implicitly (for example, slice indexing).

briansmith commented 2 years ago

With the tools we currently have available, it seems impractical to fully ensure we never panic in Rustls and that no dependency of Rustls ever panics.

Ah, interesting. From #447 I assumed your opinion was that statically panic-free Rustls (plus dependencies) was an eventually reachable goal.

I am happy with the progress we made there. Let's remember that before we did that work, we already thought Rustls was panic-free, at least when the API is used correctly, but we were wrong, as we found at least one case where an attacker could trigger the panic. We still lack the tooling we need to be confident there are no more panics.

One of the reasons to avoid panicking is to avoid the possibility of exposing inconsistent internal state. OOM=panic introduces a bunch of new places that have to be reviewed for panic safety - any allocation.

First, you can get inconsistent internal state even without panicking:

self.x= foo();
self.y = bar()?;
self.z = baz();

This breaks any invariant that might exist regarding x and z. We do use coding patterns that naturally reduce the chances of this happening.

Note that "libraries shouldn't panic or abort" isn't exclusive to C ecosystem. It's also good design for a robust Rust library that deals with untrusted input.

Aborting the process on memory allocation failure is a commonly-accepted practice and is often recommended over trying to recover from allocation failure. That's why Rust does it the way it does by default. I also frequently recommend people to build their applications with panic=abort because I've found multiple instances (not in Rustls, though I didn't look) where libraries haven't been panic-safe.

By consolidating allocations in a smaller number of places, and making those allocations fallible, we would make rustls a better library for Rust users as well. It would also make memory usage more predictable

I do think there is room for improving how we do memory allocation, but also I don't think there's justification for trying to micro-optimize memory allocation when we can just delegate those optimizations to the allocator.

I think it would be useful to see the benchmarks that show the performance or some other measurement being better with the fallible allocation than with what we do now.

Note that he fallible allocations add branches where there were no branches previously. We don't have any good way to test what happens when those branches are taken. This is compounded by the fact that the code coverage tools we have don't deal well with ?. Every time we add a ? or equivalent we add an opportunity for inconsistent state that we didn't have before, and we're unlikely to have test coverage for those cases.

which would be useful in particular for embedded systems.

I'm all for making Rustls better for embedded systems and I think it would be good to see a prioritized list of useful improvements from users trying to use Rustls in an embedded environment. I think it requires hearing from more people targeting embedded use cases, as "embedded" can mean many things. I expect the quantity of RAM we hold onto in Rustls and the API for handling input/output buffers is more problematic.

Just for clarity, my current focus w.r.t. Rustls is to make it more clearly correct and safe to use. This is why I'm particularly unexcited about adding more branches into the codebase.

briansmith commented 2 years ago

In #850 I outlined how Rustls should delegate all of the allocations it currently does for certificate-related objects to the certificate verifier plug-in. That will result in naturally eliminating many of Rustls allocations.

We also have #670 about reducing allocations in the I/O interface. That again would eliminate a lot of allocations and would improve performance.

I believe another issue for eliminating allocations during handshake message parsing, as there are currently times where we're cloning data into Vec<u8> when processing handshake messages. I suggest we "just" eliminate all of these allocations completely.

ConfigBuilders creates Vecs of cipher suites, KX groups, and TLS versions. It copies these from input slices. Also the final construction of a ClientConfig or a ServerConfig allocates a Vec for alpn_protocols and a session cache.

I think we should consider changing the ConfigBuilder interface so that these copies are no longer required, e.g. by having the application move Cow<'static, [u8]> instead of passing slices.

msgs/handshake.rs has a number of HashMap instantiations to deduplicate things:

We can eliminate all those uses of HashSet easily using the parsing technique used by webpki for de-duplicating X.509 extensions, and that would also make Rustls more efficient.

I think if we spend the effort to reduce these allocations along these lines, we'll have many fewer points in the code to consider regarding fallible allocation, which could make doing it more practical.

briansmith commented 2 years ago

I believe another issue for eliminating allocations during handshake message parsing, as there are currently times where we're cloning data into Vec<u8> when processing handshake messages. I suggest we "just" eliminate all of these allocations completely.

https://github.com/rustls/rustls/issues/603 was the issue I was thinking of.

djc commented 2 years ago

There's also my old branch to avoid allocating Vecs in message parsing code. My recent PRs are steps towards rebasing that on top of main, though there are likely a few issues to address still.

briansmith commented 2 years ago

Besides what's mentioned above, there also seems small allocations that happen during serialization--not just during parsing. Consider this fragment:

    let mut buf = Vec::new();
    let ecpoint = PayloadU8::new(Vec::from(kxd.pubkey.as_ref()));
    ecpoint.encode(&mut buf);
    let pubkey = Payload::new(buf);

If PayloadU8 worked on Cow<'a, [u8]> or similar, some of these might be avoided.

See also #889 and #890. I suspect there are many "paper cut" allocations that just need to go away.

briansmith commented 2 years ago

In addition to #894, I suggest eliminating more allocations in:

I believe implementing those suggestions will eliminate most of the infallible allocations that Rustls does. I also believe implementing them will give us experience with the coding patterns that would eliminate most of what would remain. This would make the scope of the change suggested here much smaller.

Besides those, I believe we already generally understand that we need to refactor how the buffer management is done for records and handshake messages. I think we need to make a plan for the new API that addresses the various concerns about us allocating and holding on to too much memory in those buffers. I anticipate that the solution to those problems is to put the application more in control of the buffers, and as part of doing that, I think we should be able to make the allocation of those buffers fallible for the applications that want it to be so.

In our state machine we could probably replace the Boxed state with an enum of all the possible states, as @djc had suggested before. A risk I see there is that some variants of that enum, that would only be temporary during the handshake, might be very large and then the resulting object would be bloated all the time.

If we were to make all of these changes then the number of extra branches added to Rustls to support fallible allocation should go down to a level that might be reasonable to wrap our heads around and test/fuzz.

On one hand, I think oom=panic isn't a great solution, after looking into it in more detail. Basically oom=panic can make currently-safe code to become unsafe; see https://github.com/rust-lang/rfcs/pull/2116#issuecomment-1006741034 for an example. OOTH, if/when it becomes stable I suspect a lot of people will be interested in Rustls supporting it, so it might be unreasonable to avoid supporting it.

I think rustls-ffi could implement non-fallible allocation right now by doing this following:

briansmith commented 2 years ago

@djc @jsha Thoughts?

djc commented 2 years ago

I think for now we should not prioritize fallible allocation directly and work instead on avoiding allocations inside the library where this does not cause too much complexity. I think Joe wants to keep something shaped roughly like the current API for ease of use, so we should probably design something modular such that there are different "layers" that can be used by more or less sophisticated users.

briansmith commented 2 years ago

BTW, regarding my oom=panic idea, see https://github.com/rust-lang/rfcs/pull/2116#issuecomment-1006741034. oom=panic has downsides that I didn't anticipate when I made that suggestion, and it isn't clear if/when libstd will become compatible with oom=panic. So it doesn't seem like as viable of a solution as I expected.

jsha commented 2 years ago

I think rustls-ffi could implement non-fallible allocation right now by doing this following:

It's an interesting idea, but it still doesn't get us to "rustls never aborts the program," it only gets us to "rustls never aborts the program, so long as rustls stays within its memory budget on this call." And the memory budget would be best-effort and not enforced with the type system. It also adds significant complexity - and we'd prefer rustls-ffi to be a thin layer around rustls rather than a thick one.

It sounds like the current tooling and near-term RFCs for Rust don't really support a library that promises not to abort. That's fine; I'll document it as a property of rustls-ffi. I appreciate the detail in which you looked at the problem.

DemiMarie commented 1 year ago

I think for now we should not prioritize fallible allocation directly and work instead on avoiding allocations inside the library where this does not cause too much complexity. I think Joe wants to keep something shaped roughly like the current API for ease of use, so we should probably design something modular such that there are different "layers" that can be used by more or less sophisticated users.

What is the reason for this decision?

djc commented 1 year ago

@DemiMarie what decision do you mean? Any particular reason you're getting back to this after 2 years?

DemiMarie commented 1 year ago

@djc For at least libcurl, “won’t crash on allocation failure” is part of the API contract. That said, avoiding rustls-internal allocations is even better, because it allows rustls to be used in e.g. embedded systems without alloc.

djc commented 1 year ago

See also #1420.

ctz commented 1 year ago

it allows rustls to be used in e.g. embedded systems without alloc.

At the moment, operation without alloc isn't in scope for this crate. That might change in the future, though certainly after we support no-std.

In general I think microcontroller-level devices capable of meaningfully using TLS (eg, they have enough RAM to dedicate to TCP send and receive buffers) have enough space to maintain a heap, and are already accepting enough software complexity to warrant it.

My experience of "we have no heap so it cannot run out"-type embedded engineering approaches is that they invariably end up with many, varied, informal pools and arenas that instead need the same "what happens if we run out", "why is this leaking" and "what is the maximum usage" issues addressing.