rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.66k stars 432 forks source link

Solaris getrandom() support breaks illumos #637

Closed jperkin closed 5 years ago

jperkin commented 5 years ago

Building rust 1.30.0 on illumos platforms (specifically SmartOS) results in build failure, which is a regression compared to 1.29.x.

I've tracked this down to the introduction of getrandom() support for Solaris, implemented in 43b4c1b162dbc810c673c49971e5be8454a55ea6.

The rust build failure itself isn't particularly helpful (abridged):

   Compiling core v0.0.0 (file:///home/pbulk/build/lang/rust/work/rustc-1.30.0-src/src/libcore)
error: Could not compile `core`.
expected success, got: exit code: 101
thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1112:9

but digging in further, I was able to deduce that this was caused by an ENOSYS failure, and truss(1)ing the build showed which syscall was being attempted:

30136/1:        sys#143()                                       Err#89 ENOSYS
30136/1:            Received signal #12, SIGSYS [default]
30136/1:              siginfo: SIG#0

Source code analysis (as I wasn't able to get a useful backtrace) showed that the attempt to call syscall 143 is coming from the Solaris getrandom test introduced in that commit.

If I simply bail early from that function with the following diff:

--- src/vendor/rand/src/rngs/os.rs.orig 2018-10-24 21:38:28.000000000 +0000
+++ src/vendor/rand/src/rngs/os.rs
@@ -675,6 +675,7 @@ mod imp {
     }

     fn getrandom(buf: &mut [u8], blocking: bool) -> libc::c_long {
+        return -1;
         extern "C" {
             fn syscall(number: libc::c_long, ...) -> libc::c_long;
         }

then the build succeeds as normal.

I don't know why the ENOSYS handler here isn't working correctly, so I don't have a proper patch to fix this issue unfortunately. I should also point out that while 143 might be the syscall id for getrandom() on Solaris 11.3, the equivalent syscall on illumos is 126, so it's probably worth adding support for that too. The illumos platform that I build on is an older release which doesn't have getrandom().

Thanks.

dhardy commented 5 years ago

Naive question: if target_os = "solaris" matches illumos, how do we detect the latter?

Follow-up question: will is_getrandom_available work correctly on both old and new versions of illumos once the syscall number is fixed?

jperkin commented 5 years ago

I don't know that rust has any way to differentiate illumos at this time, unfortunately. As the two code bases continue to diverge it's likely we will need some way to do this.

As for adding the illumos syscall number, I don't think solely doing that will resolve this issue, as the logic in my case would still be the same (the syscall not being present, regardless of whether it's 126 or 143, causing a panic rather than being caught and turning off the feature).

jasonbking commented 5 years ago

This is probably better off looking at libc symbols and basing it's behavior off that. The syscall interface on Solaris and it's derivatives has never been a stable interface -- in the past syscalls have changed with merely applying an OS patch.

dhardy commented 5 years ago

If one of you (or someone) can put together a proper patch, if possible including some kind of CI testing, I'll be happy to review, but I can't promise more than that.

papertigers commented 5 years ago

We could do something like:

extern crate libc;

unsafe fn findsymbol(name: &str) -> usize {
    libc::dlsym(libc::RTLD_DEFAULT, name.as_ptr() as *const _) as usize
}

fn main() {
    // It's a C style string
    let syscall = "getrandom\0";

    unsafe {
        match findsymbol(&syscall) {
            0 => println!("addr of {}: not found", syscall),
            n => {
                println!("addr of {}: {:x}", syscall, n);

                let getrandom = std::mem::transmute::<
                    usize,
                    fn(*mut u8, libc::size_t, libc::c_uint) -> libc::ssize_t,
                >(n);

                // use getrandom
                let mut buf: [u8; 6] = [0; 6];
                const GRND_NONBLOCK: libc::c_uint = 0x0001;
                const GRND_RANDOM: libc::c_uint = 0x0002;

                getrandom(buf.as_mut_ptr(), buf.len(), GRND_NONBLOCK | GRND_RANDOM);
                println!("random value: {:?}", buf);
            }
        }
    }
}

Which returns:

root - rustdev ~/src/getrandom (git:HEAD) # cargo build
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
root - rustdev ~/src/getrandom (git:HEAD) # ./target/debug/getrandom
addr of getrandom: fffffc7fef25a790
random value: [16, 126, 219, 6, 195, 117]
root - rustdev ~/src/getrandom (git:HEAD) # ./target/debug/getrandom
addr of getrandom: fffffc7fef25a790
random value: [140, 19, 49, 157, 197, 240]
root - rustdev ~/src/getrandom (git:HEAD) # ./target/debug/getrandom
addr of getrandom: fffffc7fef25a790
random value: [174, 242, 210, 118, 25, 93]

Thoughts?

dhardy commented 5 years ago

Is this standards compliant? Will it work on any ~solaris system?

Due to the non-zero pointer optimisation, I think we could just do the following. Either way, we can roll the unsafe stuff into a single function:

type GetRandom = fn(*mut u8, libc::size_t, libc::c_uint) -> libc::ssize_t;
fn findsymbol(name: &str) -> Option<GetRandom> {
    unsafe {
        let ptr = libc::dlsym(libc::RTLD_DEFAULT, name.as_ptr() as *const _);
        std::mem::transmute(ptr)
    }
}
papertigers commented 5 years ago

As a comparison:

extern crate libc;

type GetRandom = fn(*mut u8, libc::size_t, libc::c_uint) -> libc::ssize_t;
fn findsymbol(name: &str) -> Option<GetRandom> {
    unsafe {
        let ptr = libc::dlsym(libc::RTLD_DEFAULT, name.as_ptr() as *const _);
        std::mem::transmute(ptr)
    }
}

fn main() {
    // It's a C style string
    let syscall = "getrandom\0";
    let unknownsyscall = "foobar\0";

    match findsymbol(&syscall) {
        None => println!("{} not found", syscall),
        Some(getrandom) => {
            let mut buf: [u8; 6] = [0; 6];
            const GRND_NONBLOCK: libc::c_uint = 0x0001;
            const GRND_RANDOM: libc::c_uint = 0x0002;

            getrandom(buf.as_mut_ptr(), buf.len(), GRND_NONBLOCK | GRND_RANDOM);
            println!("random value: {:?}", buf);
        }
    };

    match findsymbol(&unknownsyscall) {
        None => println!("{} not found", unknownsyscall),
        Some(_) => {
            // Should not be reached
            println!("found {}", unknownsyscall);
        }
    };
}

Output from a smartos/illumos system:

root - rustdev ~/src/getrandom (git:HEAD) # cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/getrandom`
random value: [250, 6, 117, 211, 182, 77]
foobar not found

dlsym should work on any ~solaris based system to the best of my knowledge, however I don't have a solaris box to run the example on. I am not sure how many Oracle solaris based users Rust has in general. All of the work to get Rust running has been done on illumos as far as I know. So not even sure if there is a rust build for a system like that.

dhardy commented 5 years ago

Great! Since the return type is very specific, it shouldn't be called findsymbol now of course.

Want to make a PR? Then we can ask users to test.

jasonbking commented 5 years ago

It appears Oracle (at least in Solaris 11.4, I didn't check other versions) have marked getrandom(2) as a public, committed interface (despite it appearing in the syscall section, the assumption is still that you use the libc wrapper to call it). illumos still appears to consider it a private interface. We usually discourage things outside of the illumos source tree from using private interfaces. That being said, there's also no reason getrandom(2) couldn't be updated to a public interface if it makes sense to (assuming the illumos community agrees) -- given that both Linux and FreeBSD (at least as of FBSD 12) appear to support the same interface, it probably wouldn't be a hard sell. The whole public/private thing is just a tool to hopefully communicate things may or may not change in a backwards-incompatible manner -- illumos is after all open source, so there's nothing stopping anyone from using whatever interfaces they want.

nagisa commented 5 years ago

Protip: you should not transmute that. Just implement the code the naive way and let the optimizer to deal with it instead.

nagisa commented 5 years ago

I’m also strongly considering proposing making a separate target for illumos. The two things are so incompatible by now that it makes no sense to consider them similar in any way.

jasonbking commented 5 years ago

After some unrelated issues, I have a change that should work on both illumos and Solaris (it's modeled a bit off @papertigers example as well as the weak and syscall macro in libstd::sys::unix (it would be awesome to have that macro available in the libc crate for feature detection on illumos and Solaris). Do you want/care if it's run through rustfmt (if so 2015 or 2018) before creating the PR?

papertigers commented 5 years ago

Protip: you should not transmute that. Just implement the code the naive way and let the optimizer to deal with it instead.

Could you clarify what you mean? Is there a better way to deal with telling the compiler the address we got back is actually the getrandom function?