rust-random / getrandom

A small cross-platform library for retrieving random data from (operating) system source
Apache License 2.0
264 stars 166 forks source link

use_file: Just use `libstd` on targets that we know have libstd, instead of pthreads + libc. #453

Open briansmith opened 4 weeks ago

briansmith commented 4 weeks ago

I propose that on all targets where we know libstd is available, we use libstd instead of pthreads and libc for the use_file implementation, effectively reverting to the original implementation.

The current proposal for x86_64-unknown-linux-none in #424 is to limit support for that target to Linux kernel versions that have the getrandom syscall, so it won't use use_file.

Historically, we've tried to make it so getrandom avoids using libstd in part so that libstd itself could use getrandom as a dependency. However, I think it is clear that libstd is not going to use this crate as its needs are almost completely different; indeed its code has diverged considerably.

Another reason we've avoided libstd is to ensure that we are #[no_std] compatible. However, once we add x86_64-unknown-linux-none support, we'll have an actual no-std target that we can build (at least) for to ensure the core parts of this crate remain no_std compatible.

use_file is a significant chunk of the unsafe code, with the most problematic dependencies (libpthreads). It duplicates functionality that's in libstd, and the functionality it duplicates is exactly the kind of functionality (threading primitives) that we should avoid duplicating, as we're not getting any benefit from the QA and improvements in libstd.

Of all the targets that use the use_file implementation, only one doesn't have libstd support: 32-bit x86 QNX Neutrino. I believe @samkearney is working on it, based on their comment in https://github.com/rust-lang/rust/pull/109173#issuecomment-1476495419.

samkearney commented 4 weeks ago

Thanks for the mention @briansmith, just FYI, I have abandoned the work to bring std support to x86 QNX neutrino 7.0. QNX dropped support for 32-bit architectures in 7.1 (released 2020) and the cost-benefit for adding support to a 7-year-old release doesn't make sense anymore.

That being said, I also don't care if no_std x86 QNX 7.0 breaks, and I would wager pretty strongly that virtually nobody else is using it either (it is much more likely that they are using the more recent QNX 7.1 or the in-progress support for QNX 8).

briansmith commented 4 weeks ago

I think we should switch 32-bit x86 QNX to be a target for which we support custom implementations, and drop the built-in implementation for it. That way we don't have to carry around the current implementation forever just for it. Anybody who needs the current implementation could adopt the current use_file code in their custom implementation.

samkearney commented 4 weeks ago

Works for me. 👍

josephlr commented 4 weeks ago

Historically, we've tried to make it so getrandom avoids using libstd in part so that libstd itself could use getrandom as a dependency. However, I think it is clear that libstd is not going to use this crate as its needs are almost completely different; indeed its code has diverged considerably.

My long-term goal would still be to have Rust's standard library use getrandom, but that requires #365. Upstream seemed fairly open to it last time we tried, and it would be nice to have this crate be officially part of the Rust project. However, even if this crate were used by libstd:

So I think this issue is still worth considering, even if we do try to get this into libstd one day.

I propose that on all targets where we know libstd is available, we use libstd instead of pthreads and libc for the use_file implementation, effectively reverting to the original implementation.

Do you know if there would be a way to poll the /dev/random file descriptor using std, or would we need to still use libc for that?

briansmith commented 4 weeks ago

I agree with your "even if this crate were used by libstd...".

Do you know if there would be a way to poll the /dev/random file descriptor using std, or would we need to still use libc for that?

I think we'd still need to use AsFd::as_fd (1.63+) or as_raw_fd() to get the file descriptor and poll it with libc. But, only on Linux/Android.

newpavlov commented 3 weeks ago

I am fine with using std for Mutex and File. There were proposals to make getrandom similar to the alloc crate, doing so would make use of std not possible. But it's probably too far in the future (if it ever happens in the first place), so we can ignore it.

notgull commented 3 weeks ago

Note that this would be a breaking change, since we'd need to remove the std feature or make it useless.

briansmith commented 3 weeks ago

The std feature would exist to control whether we use libstd on targets that aren't known to already support libstd, and to control whether we expose libstd in the public API, just like today.

I don't think this would be a breaking change for any target mentioned in https://doc.rust-lang.org/rustc/platform-support.html. It could perhaps be a breaking change for other targets, but I'm not sure getrandom supports any such targets (maybe accidentally).

newpavlov commented 2 weeks ago

Maybe we could remove use of mutexes entirely and use sleep like we do for VxWorks? Contention of the file opening section is quite unlikely in practice, so even if the sleeping branch will be triggered, it's unlikely that impact will be noticeable by users.

briansmith commented 2 weeks ago

@newpavlov mentioned that we basically implemented a spinlock for VxWorks. It seems like the Rust VxWorks targets support libstd, so I am hoping that we can replace that spinlock with whatever primitive we use in use_file, since VxWorks has the same structure as use_file on Linux, namely blocking on the operating system to report that the system PRNG has been seeded.

One of my goals is to remove as much of the custom synchronization logic in getrandom as we can, so I am hoping we can find a way to use libstd-provided primitives when possible. Unfortunately, the MSRV situation makes it tricky since OnceLock seems like the best choice, but it was only stablized in Rust 1.70.

newpavlov commented 2 weeks ago

On second thought, I agree about libpthread mutex being problematic, but what are advantages of using std::fs::File instead of libc::open/read directly?

briansmith commented 2 weeks ago

On second thought, I agree about libpthread mutex being problematic, but what are advantages of using std::fs::File instead of libc::open/read directly?

Except for some efficient tweaks, pretty much all of the recent changes I've submitted are part of a bigger project that consists of reducing the use of unsafe and some other dubious constructs (like casts and DIY synchronization primitives) in getrandom and some other crates.

The advantage will be bigger once the the Rust read_buf feature is stable and our MSRV is high enough to use it, because we'll be able to replace the use of libc::read with Read::.read_buf. Once that happens, the entire use_file module would not be free of unsafe code except for the polling of /dev/random. Similarly, once OnceLock::get_or_try_init is stable the advantage will be bigger again.

I've submitted PR #481 to make progress towards this.

briansmith commented 2 weeks ago

OK, so to summarize this:

There are two main improvements that are available in libstd over the current code:

  1. libstd uses futexes and we should really stop forcing a pthreads dependency. Unfortunately, std::sync::OnceLock doesn't really have the capabilities we need because OnceLock::get_or_try_init is still unstable. I think instead of trying to switch to libstd for synchroniziation primitives, we should instead switch to once_cell, as done in PR #481 and PR #483. Those two PRs, along with #484, completely eliminate use_file::Mutex and src/lazy.rs. I do think overall that is a good direction, because we'd be better off helping to maintain once_cell than we'd be if we kept maintaining our own synchronization primitives.

PR #485 shows how things would look when libstd's unstable features are stablized. Note that we'd still need OnceBool/LazyBool even still.

newpavlov commented 2 weeks ago

As I wrote in the other PR, I think we can use futex directly on Linux targets. We already have the 32 bit FD atomic, which is perfectly suitable for it. I will try to play with it a bit later.

I am also not sure about once_cell. Yes, it's a widely used dependency, but it's still a 1k+ LoC dependency used to replace 10-20 easily reviewable LoCs.

briansmith commented 2 weeks ago

I am also not sure about once_cell. Yes, it's a widely used dependency, but it's still a 1k+ LoC dependency

I agree that that is unfortunate but also I think many of us need to review it anyway for other projects so duplicating it in getrandom is extra work for a lot of people.

used to replace 10-20 easily reviewable LoCs.

LazyBool is only like 10-20 LoC, but that's because it's based on LazyUsize which isn't so trivial to review and which is over-featured compared to what we need. For example, it takes effort to verify that LazyUsize::UNINIT doesn't conflict with any real value, whereas once_cell::race::OnceNonZeroUsize has a clearer design.

It's not the end of the world, but IMO this isn't a great place to be making new synchronization primitives.

As I wrote in the other PR, I think we can use futex directly on Linux targets. We already have the 32 bit FD atomic, which is perfectly suitable for it. I will try to play with it a bit later.

I do think that this isn't a good place to put a futex-based mutex implementation either....

josephlr commented 1 week ago

After looking through some of the proposed ideas:

I don't think we really need an external dependency our a custom futex implementation for this crate; we can simplify the code (and remove the libpthread dependency) without either of these things.

I think we should take one of two approaches:

Looking at this, I would prefer the first approach of using std without an external dependancy.

briansmith commented 1 week ago

I'm not sure why the "use once_cell" approach isn't considered. If once_cell is a problematic dependency for trying to define tricky synchronization primitives then getrandom would be a problematic dependency for the same reasons, except worse because nobody expects getrandom to be defining its own synchronization primitives. getrandom defining its own synchronization primitives creates a lot of work for people trying to use it.

I do think the promise that we at open at most N file descriptors where N is very small (1 or 2) should not be discounted. The LazyFd idea is clever but it's still unbounded in the number of file descriptors it opens concurrently as N threads could each open one.

I do think the std::sync::Mutex-based approach is the safest and simplest approach, so that's what I prefer.

newpavlov commented 1 week ago

Personally, I don't think that our synchronization needs are tricky enough to warrant introduction of a new 2.2 kLoC dependency for tier-1 Linux targets. Even worse, this dependency will be present even with disabled file fallback.

I do think the std::sync::Mutex-based approach is the safest and simplest approach, so that's what I prefer.

What do you think about the sleep loop approach on non-Linux targets? AFAIK contention on the FD atomic should be extremely rare in practice in the first place and by surrounding time slice by sleeping we "solve" the potential priority inversion problem even in pathological scenarios.

josephlr commented 1 week ago

I'm not sure why the "use once_cell" approach isn't considered. If once_cell is a problematic dependency for trying to define tricky synchronization primitives then getrandom would be a problematic dependency for the same reasons, except worse because nobody expects getrandom to be defining its own synchronization primitives. getrandom defining its own synchronization primitives creates a lot of work for people trying to use it.

Personally, I don't think that our synchronization needs are tricky enough to warrant introduction of a new 2.2 kLoC dependency for tier-1 Linux targets. Even worse, this dependency will be present even with disabled file fallback.

I would prefer to use once_cell if (and only if) we can replace all our synchronization needs with it. I wouldn't want someone doing a security review of this crate to have to read synchronization code in two different places. I initially thought https://github.com/rust-lang/cargo/issues/1197 would prevent us from using the race feature on some targets and the std feature on others. However, in my experimentation, it seems like we could actually do this.

I'll try to prototype what it would look like to actually use once_cell for all the syncronization stuff (NetBSD Weak symbol, shared /dev/{urandom, random} fds, LazyBool, VxWorks init).

I do think the promise that we at open at most N file descriptors where N is very small (1 or 2) should not be discounted. The LazyFd idea is clever but it's still unbounded in the number of file descriptors it opens concurrently as N threads could each open one.

What do you think about the sleep loop approach on non-Linux targets? AFAIK contention on the FD atomic should be extremely rare in practice in the first place and by surrounding time slice by sleeping we "solve" the potential priority inversion problem even in pathological scenarios.

If we end up going with a "no locking" approach, I would be fine implementing something like LazyFd::get_or_open by either:

Given that the race is very unlikely (I haven't been able to even make it happen once in my testing), any reasonable approach seems fine.

briansmith commented 1 week ago

I'll try to prototype what it would look like to actually use once_cell for all the syncronization stuff (NetBSD Weak symbol, shared /dev/{urandom, random} fds, LazyBool, VxWorks init).

NetBSD weak symbol support needs at least a new race::OnceNonNull or otherwise a full race::OnceWeak weak function binding API. race::OnceRef won't work for function pointers. I'm not sure OnceNonNull would be accepted by them because its an inherently less-safe API (dealing with pointers).