`rand_core:0.5.0::Error` has no raison d'etre

nagisa commented 5 years ago

The rand_core:0.5.0::Error is reduced down to an integer code, which has exactly as much value as no error at all. The stored integer carries no information whatsoever (esp. in generic contexts) and the error, if Displayed conveys no useful information either.

Consider for example:

fn foo<R: RandCore>(rng: R) {
    match rng.try_fill_bytes(...) {
        Err(e) => {
            // In this context "error code 42" is about as useful as 
            // "an error has occurred" or "bad things happened"
            //
            // Furthermore, each implementation of `RngCore` can map 
            // error numbers to causes in arbitrary ways, which makes 
            // whatever interpretation that could be derived in this generic
            // context misleadingly wrong, even.
        }
    }
}

With that in mind, the next version should reconsider this API change.

dhardy commented 5 years ago

Thanks @nagisa; great to see your input to our long discussion of error types: #800, #771. Your comments appear to be exclusively about the no_std variant of Error. But I'll let you read through those threads before repeating anything.

nagisa commented 5 years ago

Your comments appear to be exclusively about the no_std variant of Error.

Indeed. Usable no_std Error is what I care about the most. std Error seems fine… not that I’ve had a chance to deal with it yet, but many other crates do a similar thing and that seems to work fine.

long discussion of error types: #800, #771.

I have seen these issues, but haven’t read through them too thoroughly. I notice that this same issue was brought up at least once by yourself, and never really addressed:

We should pay some attention to reporting meaningful errors, not merely forwarding a code from somewhere. We should at least do this when using std; for no_std the best trade-off might be to only forward a code, or maybe we should piggy-back the log feature (or another) to add details.

I haven’t seen any exploration into usability of this trade-off, except for this small blurb (also by yourself):

On no_std this is a pure numeric code, though later with distributed slices we could potentially include error messages (though no_std often means limited memory, so also potentially not).

“Distributed slices” came up multiple times as a "solution” for no_std, but I’ve no idea what that even means. It is the first time I hear about them. log was another considered option which in my eyes is an equivalent of print-debugging, now baked into a foundational library. rand_core does not provide a list of conventional codes either (if it did, the error code should be an enum instead, bringing us back to ErrorCause).

I have a feeling that this change was made with an understanding that actual cause of the error is not that useful – transient failures are better retried internally and non-transient ones cannot be recovered anyway; this probably makes sense given that various next_* APIs do not return an error either way. I do not necessarily object to that view, but I am still unable to see the point of the error code inside Error. It will only carry useful information in situations where the RandCore as an abstraction is irrelevant.

I also saw a mention somewhere that users won’t have to care about errors to the point of knowing that an RNG failed in some way. I disagree with that, and even if I agreed, I do not see any additional value that an error code would bring here. To see why compare these hypothetical, but fairly typical error traces, as one would produce when using failure crate in their project:

With error code

error: Could not download a file
     caused by: Nonce could not be generated for an encrypted session
     caused by: RNG error code 42

Without error code

error: Could not download a file
     caused by: Nonce could not be generated for an encrypted session
     caused by: RNG error

dhardy commented 5 years ago

No, I hadn't heard of distributed slices before, and I remain a little sceptical considering we don't have a portable version yet, but we do have a limited implementation: https://github.com/dtolnay/linkme . @newpavlov's idea is that crates can each add their error messages to a [(NonZeroU32, &'static str)] slice shared by the whole application.

I have a feeling that this change was made with an understanding that actual cause of the error is not that useful – transient failures are better retried internally and non-transient ones cannot be recovered anyway;

This was the motivation for the previous design with ErrorKind — but we did not find much use for this in practice, and expecting users to write their own code for retry on interruption (ErrorKind::Transient) was not particularly helpful. Therefore this does not exist any more — we handle retries ourselves where sensible (in getrandom and ReadRng) instead of returning a "transient" error.

What type of RNG are you imagining where an error occurs more often than once-in-a-blue-moon? RDRAND already retries in a loop per design spec. The one case of RNG failure I know of was OsRng failing on Windows (7?) due to a DLL issue — this is a std platform however.

Ultimately, I agree that we should have a solution, and I'm not confident that distributed slices will be available "soon" (@newpavlov?).

hdevalence commented 5 years ago

What problem is this actually solving? Your proposed solution (distributed slices) requires either language changes or "linker shenanigans"; what benefit does this actually give to an RNG user when they hit an error?

newpavlov commented 5 years ago

The rand_core:0.5.0::Error is reduced down to an integer code, which has exactly as much value as no error at all.

I strongly disagree. For end-user having some information about error is better than having none, plus usually you know which RNG you use, so you will be able to check this error code in RNG documentation. This design was heavily influenced by development of getrandom crate, but I think it also really well fits constraints of no_std applications: has a small size, trivially extensible, error descriptions can be removed from binary by dead code removal if they are not used anywhere.

What problem is this actually solving? Your proposed solution (distributed slices) requires either language changes or "linker shenanigans"; what benefit does this actually give to an RNG user when they hit an error?

Distributed slices are only needed for nicer error messages, so instead of RNG error code 42 you would get RNG error: quantum field instability.

burdges commented 5 years ago

Any errors an Rng might generate are either os errors, including special hardware errors, or stream cipher length errors, yes?

There is an interesting research problem of doing a non-locking reentrant PRNG, but ideally such a beast should not produce any error either. In essence, the Rng would be a smart pointer with their own unique secret seed that prevented accesses from different threads from inverting one another.

nagisa commented 5 years ago

plus usually you know which RNG you use

Is usually good enough for a foundational generic interface? If "usually" is good enough, then the whole concept of the RngCore trait is starting to lose its meaning – you could just use the API of the concrete RNG type after all.

EDIT: I do agree that if you do know the concrete RNG type, then it is definitely possible to communicate extra information back to the user. However, if those scenarios were all that we needed to handle, they would be handled equally well by an associated Error type on RngCore. With that alternative on the table, a numeric code is just a bad API design.

Any errors an Rng might generate are either os errors, including special hardware errors

Not sure if I would consider hardware errors the same as OS error – hardware isn’t always uniformly abstracted by an OS.

newpavlov commented 5 years ago

Your claim was:

integer (error) code <..> has exactly as much value as no error at all

I find it's really strange that you find some information exactly equivalent to no information at all. Even in generic context we are able to match on error codes (e.g. Error::UNKNOWN or Error::UNAVAILABLE), in future we also could add additional codes or even error prefixes for common error classes.

As for using associated error type, in my experience they are less convenient to handle in generic contexts. Yes, we could define rand_core::Error trait and use associated error type in RngCore bound by this trait, but it looks like an over-engineered solution to me (plus, it will not be object-safe).

What exactly do you want to be able to do with rand_core::Error in generic contexts? The only potentially useful thing I can remember is an ability to distinguish retryable vs non-retryable errors. Previously I proposed adding is_retryable method, but no one has expressed interest in it.

nagisa commented 5 years ago

Your claim ... I find it's really strange that you find some information exactly equivalent to no information at all

With what’s available in 0.5.0 I stand by that claim. Some information will exist in an integer, sure, but without means to interpret the number this information provides no knowledge and therefore no value can be derived.

And just to reiterate, "By knowing that I’m using crate rand_lava, and looking at the source code, I now know that error code 42 occurs when somebody unplugged the lava lamp” is not a satisfactory approach to error code interpretation.

Even in generic context we are able to match on error codes (e.g. Error::UNKNOWN or Error::UNAVAILABLE)

So far I’m not seeing either Error::UNKNOWN nor Error::UNAVAILABLE or any other codes to compare against anywhere in the public API of rand_core. Am I just blind? Why does Display not format those into sensible textual descriptions of the error?

What exactly do you want to be able to do with rand_core::Error in generic contexts? The only potentially useful thing I can remember is an ability to distinguish retryable vs non-retryable errors. Previously I proposed adding is_retryable method, but no one has expressed interest in it.

I’ll be happy if I’m able to:

Format the error into a user-readable and interpretable description of the cause behind rand_core::Error at least for the most common error situations (i.e. “random number generator is unavailable” and not “error code 711”) that has been returned by arbitrary implementor of the RngCore trait; while
Having the ability to quickly and easily specify these common situations as an implementor of the RngCore trait.

At the most basic level I’d be content with the aforementioned

impl Error {
    const UNAVAILABLE: _ = ...;
    const TRANSIENT: _ = ...;
    const etc...
}

but I would also find that to be fairly clunky to work with as far as APIs in Rust go.

dhardy commented 5 years ago

So far our convention has been

// —— in getrandom::error ——
// A randomly-chosen 24-bit prefix for our codes
pub(crate) const CODE_PREFIX: u32 = 0x57f4c500;
const CODE_UNKNOWN: u32 = CODE_PREFIX | 0x00;
const CODE_UNAVAILABLE: u32 = CODE_PREFIX | 0x01;

// —— in rand_jitter::error ——
/// All variants have a value of 0x6e530400 = 1850934272 plus a small
/// increment (1 through 5).
pub enum TimerError {
    /// No timer available.
    NoTimer = 0x6e530401,
    ...

... and all our defined error codes could be considered "unknown" or "unavailable".

We could tweak this and assign meaning to different parts of the code (e.g. 16 bits for a non-zero crate identifier, 8 bits for a "status" code and 8 bits for lib-specific details), then provide functions like fn is_unavailable(&self) -> bool which operate on the status bits for no_std.

But: it's a breaking API change for both no_std and std and still doesn't provide user-readable error messages.

To get messages, I think the only options for no_std are (1) &'static str, (2) distributed slices or (3) some ugly hack using static memory (set_err_msg(&'static str)).

newpavlov commented 5 years ago

I think @nagisa means that those constants should be duplicated in the rand_core with an appropriate Debug and Display impl modifications. I don't have objections against it.

Format the error into a user-readable and interpretable description of the cause behind rand_core::Error

Note that with an enabled std feature you will get a meaningful error description instead of bare error code for some OS errors (see this part and usage of error_msg_inner in getrandom). The same approach can be used by RNG crates, with an enabled std feature they can use Error::new and without it they can fall back to bare error codes.

newpavlov commented 5 years ago

@josephlr

I has just assumed that the RngCore trait would just have a type Error parameter, and each generator would chose its own type. is there a reason things are not done that way?

With this approach RngCore (and IIUC as a consequence Rng trait as well) will not be object safe anymore. If we want to query additional information about error, then we will have to introduce an additional rand_core::Error trait. To me it looks a bit too complicated compared to the current concrete type which in future can be extended in a backwards compatible way, this is why I've dropped this idea during discussions of Error type redesign and it was not brought by anyone else.

One advantage of associated error type would be an ability to use ! (or uninhabited type for now) to indicate impossibility of RNG failure. But I am not sure if it's worth losing object safety and simplicity of concrete type. Also we will have to release rand_core v0.6 and bump minor versions for other rand crates.

dhardy commented 5 years ago

IIRC there was some difficulty providing a blanket implementation of the following type due to the lack of specialisation and Rust enforcing no overlapping impls.

impl<E: Into<Error>, R: RngCore<E>> RngCore<Error> for R { ... }

Thus not only does using an associated type make RngCore not object-safe, it means no object-safe version of RngCore can be implemented automatically for all RNGs. (I guess though we could have used a separate ObjSafeRngCore trait.)

burdges commented 5 years ago

If you need a separate trait, then maybe make the rarely used trait the not object safe one, ala

pub trait TryRngCore {
    type Error;
    fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Error>;
}

I've no idea if anyone wants try_fill_bytes on an object safe trait, but if so then one could always imagine the completely crazy "not my problem" version. lol

pub trait RngCore {
    fn next_u32(&mut self) -> u32;
    fn next_u64(&mut self) -> u64;
    fn fill_bytes(&mut self, dest: &mut [u8]);
    fn try_fill_bytes<E>(&mut self, dest: &mut [u8]) -> Result<(), E>
    where E: for<'a> Into<&'a dyn ErrorTrait>;
}

dhardy commented 5 years ago

Well, that's the neatest proposal I've heard yet (other than ditching all information in the Error type), but still somewhat disruptive, and doesn't meet @newpavlov's goal of small error values due to the required fat pointer in RngCore::try_fill_bytes (though whether that matters in this case is another question).

burdges commented 5 years ago

I doubt my second joke proposal actually works since E: for<'a> Into<&'a dyn ErrorTrait+Clone> does not quite suffice, but maybe E: Into<dyn ErrorTrait> once DST support improves dramatically.

I suppose associated types break trait objects right from the beginning, so you cannot simply hide the methods from the trait object?

pub trait RngCore {
    fn next_u32(&mut self) -> u32;
    fn next_u64(&mut self) -> u64;
    fn fill_bytes(&mut self, dest: &mut [u8]);
    type Error;
    fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Error> where Self: Sized;
}

josephlr commented 5 years ago

I suppose associated types break trait objects right from the beginning, so you cannot simply hide the methods from the trait object?

So I started experimenting with this, and apparently this RngCore is Object Safe. Specifically:

pub trait RngCore {
    fn next_u32(&mut self) -> u32;
    fn next_u64(&mut self) -> u64;
    fn fill_bytes(&mut self, dest: &mut [u8]);
    type Error: Debug;
    fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Self::Error>;
}

is Object Safe because to use a &dyn RngCore you have to specify an Error type.

fn annoying(rng: &dyn RngCore<Error = SomeErrorType>) { ... }

but this is a little annoying, we would like the ablity to use a common Error type. As @dhardy noted,

impl<R: RngCore> RngCore for R where CoreErr: From<R::Error> {
    type Error = CoreErr;
    ...
}

will not work. But we can use a wrapping struct to do the same thing:

pub struct Wrap<R>(pub R);
impl<R: RngCore> RngCore for Wrap<R> where CoreErr: From<R::Error> {
    type Error = CoreErr;
    ...
}

Then a user can implement non-generic methods like:

fn better(rng: &dyn RngCore<Error = CoreError>) { ... }

and call them like:

let mut rng = FailRng;
annoying(&mut rng);
let mut wrap_rng = Wrap(rng);
better(&mut wrap_rng);

Here's a playground with all of the code.

newpavlov commented 5 years ago

Yes, by not being object safe I was implying that you can't use it without fixing the error type, thus you essentially lose all advantages of the associated type.This is why for example in RustCrypto I am using a separate DynDigest trait to work around this issue.

Your solution will work, as will a separate object-safe trait, but to me it looks like a needless over-engineering for too little gain. Don't forget that RNG errors will be extremely rare. Do we really need such design complications? Especially considering that in future it's quite likely we will be able to improve user-facing no_std error messages without breaking changes.

josephlr commented 5 years ago

Do we really need such design complications?

I agree, this associated type stuff is probably not worth the hassle. I think that if we can get getrandom::Error to have good messages, and have those messages stay good when it's converted to a rng_core::Error, I think it would be sufficient.

in future it's quite likely we will be able to improve user-facing no_std error messages without breaking changes.

Is this just the fixes to getrandom? Or is there a upcoming language feature that would help?

newpavlov commented 5 years ago

Is this just the fixes to getrandom? Or is there a upcoming language feature that would help?

It's not quite "upcoming", but looks like there is enough interest in it (e.g. see this thread). It's mentioned earlier in this thread "distributed slices" feature, see for example here for a rough draft of how it can look.

But even without this feature we can improve situation by adding a private constant array to rand_core with descriptions for most common error codes. Maybe it's also worth to copy some error constants from getrandom.

burdges commented 5 years ago

Yes, I similarly meant that associated types must be specified. I'm unsure however if the object safety rules could relax this if associated types never appeared in object safe methods.

I do think separating out the TryRngCore trait above works. Or heck maybe

pub trait GetRng : Future<Output=Result<R,Error>> {
    type R: RngCore:
    type Error;
}

All this sounds like overkill though.

dhardy commented 5 years ago

Agreed, use of an associated type here is overkill. For now, we mostly care about getrandom errors and could potentially hard-code the error constants for a few other libraries.

We can approach this in two ways; either hard-code all error-codes and messages in rand_core or use @josephlr's re-assignment of getrandom error-codes to detect those errors assigned to getrandom and hook into its Debug/Display impls (maybe for this we want a getrandom::Error::try_from_code function).

Note that getrandom#54 leaves 2^31 - 2^16 - 1 error codes unassigned; we may wish to reserve some additional blocks (e.g. for rand_jitter).

Unfortunately this requires changing many error codes from the last release. Landing these changes will already require bumping the minor version on all crates.

josephlr commented 5 years ago

or use @josephlr's re-assignment of getrandom error-codes to detect those errors assigned to getrandom and hook into its Debug/Display impls (maybe for this we want a getrandom::Error::try_from_code function).

With https://github.com/rust-random/getrandom/pull/58, getrandom will actually be no_std, so we could just unconditionally depend on getrandom in rng_core and have rng_core::Error be a thin wrapper around getrandom::Error (when not using the std feature). This would make it easy to pass the Debug/Display implementations through.

newpavlov commented 5 years ago

Unfortunately this requires changing many error codes from the last release. Landing these changes will already require bumping the minor version on all crates.

I don't think so. This is why I've insisted on adding the following line to getrandom docs:

These codes are provided for reference only and should not be matched upon (but you can match on Error constants). The codes may change in future and such change will not be considered a breaking one.

So we can change error codes and use the highest bit in error code to distinguish OS errors from custom ones.

dhardy commented 5 years ago

This issue has stagnated, yet resolving it (one way or another) is an important step towards stabilisation, thus it deserves some priority.

In doing so, we should if possible minimise disruption.

Option: add RngCore::Error assoc. type

No: this is very disruptive since RngCore is no longer a concrete type, thus users must replace mentions with e.g. RngCore<Error = rand::Error>.

Option: remove RngCore::try_fill_bytes

Again, no; this is also disruptive (though not quite so bad). This also covers use of a companion trait like TryRngCore.

Option: revert `Error` to 0.5/0.6 impl

Also no: it's disruptive (except for users still on ≤0.6) and the new implementation has several improvements.

Option: replace no_std code field with ...

Without std, rand_core::Error has a single field, code: NonZeroU32. This is well integrated with getrandom::Error (including formatting), but less well with other error sources.

The question is with what that (a) is no_std compatible and (b) doesn't lose information from getrandom?

Option: augment no_std code field with description

Instead of replacing the code field, we could augment it, perhaps with msg: Option<&'static str>. This could be optional, under an opt-in crate feature.

Does this cover our needs, especially if this is optional (allowing crates to continue using the From<NonZeroU32> constructor)?

How do we make constructors std-compatible — perhaps by adding the same field to the ErrorCode struct?

If the field is feature-gated, do we keep the same constructors regardless, simply discarding the description when the field is not enabled? Anything else is going to cause annoying compile errors.

Option: few status constants

As mentioned above, we add some extra error constants like UNAVAILABLE. This is a minimal solution, but not a very good one.

Option: library of many possible constants

Expand the above, accepting many different error codes and descriptions from derived libs. Quite a horrible option (lots of consts but still not universal).

Option: run-time registration of error codes

Since we can't rely on distributed slices yet, we could emulated these at run-time, allowing crates to register possible error codes on start-up.

Besides being very hacky, this cannot be done very well without an allocator, and is not ideal in low-memory environments.

Option: set_last_error_msg function

Add a global Option<&'static str> value to store the last error message, and setter/getter functions, to allow implementations to set the last error message before returning an error code.

Again, hacky, and also has implementation issues since accessing mutable global memory requires synchronisation primitives which aren't necessarily available.

Option: no solution

Don't solve this: rand_lava's users are stuck with "error 42".

Any more suggestions? From this list I think the most acceptable solution would be to augment with a feature-gated description field.

burdges commented 5 years ago

Are we sure feature gated fields in structs really always work? I suppose features always being additive makes them work correctly, yes? It sounds okay then.

Are there any Rust WGs that address no_std issues? If so, could this become their problem? I think a WG with wider involvement could do things beyond individual projects, like statically sized manual trait objects, ala

pub struct ErasedOSError {
    f: fn<'a>(e: usize) -> &'a dyn Error + Debug + Display,
    e: usize,
}
impl ErasedOSError {
    pub fn as_dyn(&self) -> &dyn Error + Debug + Display { self.f(self.e) }
}

It's true rand could create some manual trait object like this, or some other mechanism, but we're likely better off leaving some no_std WG to do this.

dhardy commented 5 years ago

Like this? We've been using feature-gated fields for a long time already.

You mean some type of pre-allocated fixed-size Box able to store a dynamic object without an allocator? At this point anything dependent on language extensions is not a viable proposal in my opinion — language RFCs often take multiple years (the "placement new" RFC has even been unmerged), while we would like a solution in the near future.

newpavlov commented 5 years ago

I would prefer to not change anything in public API. Firstly, the current approach may be extended in a backwards compatible manner if we'll get distributed slices one day (well, at the cost of bumping MSRV). Secondly, I think no_std programmers will not care about text descriptions of RNG errors that much, so probably public API complications will not pull their weight here. Also RNG implementations can use std feature to switch between plain error codes and a richer error type*. And thirdly, I think an additional feature will make it harder and more confusing to understand API of rand_core, situation is not perfect right now, so I don't think we should make it worse.

*I mean RNG implementations can use code like this:

// Private error type
#[cfg(feature=""std)]
struct MyRngError { .. }

fn get_data(code: NonZeroU32) -> rand_core::Error {
    #[cfg(feature="std")]
    rand_core::Error::new(MyRngError::from_code(code))
    #[cfg(not(feature="std"))]
    rand_core::Error::from(code)
}

burdges commented 5 years ago

There are tricks like the one I wrote that do not require language extension pr se, but do require some wider consensus.

I'm not suggesting to wait for some WG, but doing something simple now, with a message or two to the relevant WG so that maybe they get interested.

dhardy commented 5 years ago

Lots of good reasoning @newpavlov, though I'd prefer we can find an outcome that everyone (including @nagisa) are happy with. Of course that may not be possible.

@burdges your tricks don't achieve much. Type-erasing a reference is easy; storing a type-erased object without a dynamically-sized Box is not. Of course an implementation could store its error in static memory and return a reference, but this is also potentially problematic (thread safety, persistence). Finally, the std::error::Error trait is (for whatever reason) not present in core.

burdges commented 4 years ago

I'm afraid cargo feature passing has proven bug prone, which makes code like https://github.com/AltSysrq/proptest/pull/170 annoying.

We might consider methods that either move the crate feature test into rand_core, like exposing ErrorCode, or maybe

impl Error {
    #[cfg(feature = "std")]
    #[inline(always)]
    pub fn make<R>(_code: NonZeroU32, inner: R) 
    where R: std::error::Error + Send + Sync + 'static
    {
        Error { inner }
    }

    #[cfg(not(feature = "std"))]
    #[inline(always)]
    pub fn make<R>(code: NonZeroU32, _inner: R) 
    where R: Send + Sync + 'static
    {
        Error { code }
    }
}

We should probably ignore this for now in the hopes that some relevant WG blesses some scheme, but I wanted to mention it.

dhardy commented 4 years ago

We got into problems doing that kind of thing before: someone adds the std flag, and now your error must support std::error::Error, except the appropriate flag didn't get activated in your crate, so you get a build error (#738).

burdges commented 4 years ago

Are you saying that you prefer the https://github.com/AltSysrq/proptest/pull/170 approach so that any desyncronization of the std flag causes build failures because the build should fail whenever that happens? I do not disagree, but..

If you wanted builds not to detect the std flag missmatch, then you'd want to minimize the std check appearing in other crates, no? I'd think this requires either exposing ErrorCode or some Error::make like function, neither of which change the rand API.

dhardy commented 4 years ago

rand_core::Error already supports From<NonZeroU32> and fn code(&self) -> Option<NonZeroU32> regardless of std mode. Is this what you want?

dhardy commented 4 years ago

This has stagnated for a long time now, so I am closing with no change (see @newpavlov's last comment, which I agreed to).

rust-random / rand