Closed nagisa closed 4 years ago
Thanks @nagisa; great to see your input to our long discussion of error types: #800, #771. Your comments appear to be exclusively about the no_std
variant of Error
. But I'll let you read through those threads before repeating anything.
Your comments appear to be exclusively about the no_std variant of Error.
Indeed. Usable no_std
Error
is what I care about the most. std
Error
seems fine… not that I’ve had a chance to deal with it yet, but many other crates do a similar thing and that seems to work fine.
long discussion of error types: #800, #771.
I have seen these issues, but haven’t read through them too thoroughly. I notice that this same issue was brought up at least once by yourself, and never really addressed:
We should pay some attention to reporting meaningful errors, not merely forwarding a code from somewhere. We should at least do this when using
std
; forno_std
the best trade-off might be to only forward a code, or maybe we should piggy-back the log feature (or another) to add details.
I haven’t seen any exploration into usability of this trade-off, except for this small blurb (also by yourself):
On
no_std
this is a pure numeric code, though later with distributed slices we could potentially include error messages (thoughno_std
often means limited memory, so also potentially not).
“Distributed slices” came up multiple times as a "solution” for no_std
, but I’ve no idea what that even means. It is the first time I hear about them. log
was another considered option which in my eyes is an equivalent of print
-debugging, now baked into a foundational library. rand_core
does not provide a list of conventional codes either (if it did, the error code should be an enum instead, bringing us back to ErrorCause
).
I have a feeling that this change was made with an understanding that actual cause of the error is not that useful – transient failures are better retried internally and non-transient ones cannot be recovered anyway; this probably makes sense given that various next_*
APIs do not return an error either way. I do not necessarily object to that view, but I am still unable to see the point of the error code inside Error
. It will only carry useful information in situations where the RandCore
as an abstraction is irrelevant.
I also saw a mention somewhere that users won’t have to care about errors to the point of knowing that an RNG failed in some way. I disagree with that, and even if I agreed, I do not see any additional value that an error code would bring here. To see why compare these hypothetical, but fairly typical error traces, as one would produce when using failure
crate in their project:
With error code
error: Could not download a file
caused by: Nonce could not be generated for an encrypted session
caused by: RNG error code 42
Without error code
error: Could not download a file
caused by: Nonce could not be generated for an encrypted session
caused by: RNG error
No, I hadn't heard of distributed slices before, and I remain a little sceptical considering we don't have a portable version yet, but we do have a limited implementation: https://github.com/dtolnay/linkme . @newpavlov's idea is that crates can each add their error messages to a [(NonZeroU32, &'static str)]
slice shared by the whole application.
I have a feeling that this change was made with an understanding that actual cause of the error is not that useful – transient failures are better retried internally and non-transient ones cannot be recovered anyway;
This was the motivation for the previous design with ErrorKind
— but we did not find much use for this in practice, and expecting users to write their own code for retry on interruption (ErrorKind::Transient
) was not particularly helpful. Therefore this does not exist any more — we handle retries ourselves where sensible (in getrandom
and ReadRng
) instead of returning a "transient" error.
What type of RNG are you imagining where an error occurs more often than once-in-a-blue-moon? RDRAND already retries in a loop per design spec. The one case of RNG failure I know of was OsRng
failing on Windows (7?) due to a DLL issue — this is a std
platform however.
Ultimately, I agree that we should have a solution, and I'm not confident that distributed slices will be available "soon" (@newpavlov?).
What problem is this actually solving? Your proposed solution (distributed slices) requires either language changes or "linker shenanigans"; what benefit does this actually give to an RNG user when they hit an error?
The rand_core:0.5.0::Error is reduced down to an integer code, which has exactly as much value as no error at all.
I strongly disagree. For end-user having some information about error is better than having none, plus usually you know which RNG you use, so you will be able to check this error code in RNG documentation. This design was heavily influenced by development of getrandom
crate, but I think it also really well fits constraints of no_std
applications: has a small size, trivially extensible, error descriptions can be removed from binary by dead code removal if they are not used anywhere.
What problem is this actually solving? Your proposed solution (distributed slices) requires either language changes or "linker shenanigans"; what benefit does this actually give to an RNG user when they hit an error?
Distributed slices are only needed for nicer error messages, so instead of RNG error code 42
you would get RNG error: quantum field instability
.
Any errors an Rng might generate are either os errors, including special hardware errors, or stream cipher length errors, yes?
There is an interesting research problem of doing a non-locking reentrant PRNG, but ideally such a beast should not produce any error either. In essence, the Rng
would be a smart pointer with their own unique secret seed that prevented accesses from different threads from inverting one another.
plus usually you know which RNG you use
Is usually good enough for a foundational generic interface? If "usually" is good enough, then the whole concept of the RngCore
trait is starting to lose its meaning – you could just use the API of the concrete RNG type after all.
EDIT: I do agree that if you do know the concrete RNG type, then it is definitely possible to communicate extra information back to the user. However, if those scenarios were all that we needed to handle, they would be handled equally well by an associated Error type on RngCore
. With that alternative on the table, a numeric code is just a bad API design.
Any errors an Rng might generate are either os errors, including special hardware errors
Not sure if I would consider hardware errors the same as OS error – hardware isn’t always uniformly abstracted by an OS.
Your claim was:
integer (error) code <..> has exactly as much value as no error at all
I find it's really strange that you find some information exactly equivalent to no information at all. Even in generic context we are able to match on error codes (e.g. Error::UNKNOWN
or Error::UNAVAILABLE
), in future we also could add additional codes or even error prefixes for common error classes.
As for using associated error type, in my experience they are less convenient to handle in generic contexts. Yes, we could define rand_core::Error
trait and use associated error type in RngCore
bound by this trait, but it looks like an over-engineered solution to me (plus, it will not be object-safe).
What exactly do you want to be able to do with rand_core::Error
in generic contexts? The only potentially useful thing I can remember is an ability to distinguish retryable vs non-retryable errors. Previously I proposed adding is_retryable
method, but no one has expressed interest in it.
Your claim ... I find it's really strange that you find some information exactly equivalent to no information at all
With what’s available in 0.5.0 I stand by that claim. Some information will exist in an integer, sure, but without means to interpret the number this information provides no knowledge and therefore no value can be derived.
And just to reiterate, "By knowing that I’m using crate rand_lava
, and looking at the source code, I now know that error code 42 occurs when somebody unplugged the lava lamp” is not a satisfactory approach to error code interpretation.
Even in generic context we are able to match on error codes (e.g. Error::UNKNOWN or Error::UNAVAILABLE)
So far I’m not seeing either Error::UNKNOWN
nor Error::UNAVAILABLE
or any other codes to compare against anywhere in the public API of rand_core
. Am I just blind? Why does Display
not format those into sensible textual descriptions of the error?
What exactly do you want to be able to do with rand_core::Error in generic contexts? The only potentially useful thing I can remember is an ability to distinguish retryable vs non-retryable errors. Previously I proposed adding is_retryable method, but no one has expressed interest in it.
I’ll be happy if I’m able to:
rand_core::Error
at least for the most common error situations (i.e. “random number generator is unavailable” and not “error code 711”) that has been returned by arbitrary implementor of the RngCore
trait; whileRngCore
trait.At the most basic level I’d be content with the aforementioned
impl Error {
const UNAVAILABLE: _ = ...;
const TRANSIENT: _ = ...;
const etc...
}
but I would also find that to be fairly clunky to work with as far as APIs in Rust go.
So far our convention has been
// —— in getrandom::error ——
// A randomly-chosen 24-bit prefix for our codes
pub(crate) const CODE_PREFIX: u32 = 0x57f4c500;
const CODE_UNKNOWN: u32 = CODE_PREFIX | 0x00;
const CODE_UNAVAILABLE: u32 = CODE_PREFIX | 0x01;
// —— in rand_jitter::error ——
/// All variants have a value of 0x6e530400 = 1850934272 plus a small
/// increment (1 through 5).
pub enum TimerError {
/// No timer available.
NoTimer = 0x6e530401,
...
... and all our defined error codes could be considered "unknown" or "unavailable".
We could tweak this and assign meaning to different parts of the code (e.g. 16 bits for a non-zero crate identifier, 8 bits for a "status" code and 8 bits for lib-specific details), then provide functions like fn is_unavailable(&self) -> bool
which operate on the status bits for no_std
.
But: it's a breaking API change for both no_std
and std
and still doesn't provide user-readable error messages.
To get messages, I think the only options for no_std
are (1) &'static str
, (2) distributed slices or (3) some ugly hack using static memory (set_err_msg(&'static str)
).
I think @nagisa means that those constants should be duplicated in the rand_core
with an appropriate Debug
and Display
impl modifications. I don't have objections against it.
Format the error into a user-readable and interpretable description of the cause behind
rand_core::Error
Note that with an enabled std
feature you will get a meaningful error description instead of bare error code for some OS errors (see this part and usage of error_msg_inner
in getrandom
). The same approach can be used by RNG crates, with an enabled std
feature they can use Error::new
and without it they can fall back to bare error codes.
@josephlr
I has just assumed that the RngCore trait would just have a type Error parameter, and each generator would chose its own type. is there a reason things are not done that way?
With this approach RngCore
(and IIUC as a consequence Rng
trait as well) will not be object safe anymore. If we want to query additional information about error, then we will have to introduce an additional rand_core::Error
trait. To me it looks a bit too complicated compared to the current concrete type which in future can be extended in a backwards compatible way, this is why I've dropped this idea during discussions of Error
type redesign and it was not brought by anyone else.
One advantage of associated error type would be an ability to use !
(or uninhabited type for now) to indicate impossibility of RNG failure. But I am not sure if it's worth losing object safety and simplicity of concrete type. Also we will have to release rand_core v0.6
and bump minor versions for other rand
crates.
IIRC there was some difficulty providing a blanket implementation of the following type due to the lack of specialisation and Rust enforcing no overlapping impls.
impl<E: Into<Error>, R: RngCore<E>> RngCore<Error> for R { ... }
Thus not only does using an associated type make RngCore
not object-safe, it means no object-safe version of RngCore
can be implemented automatically for all RNGs. (I guess though we could have used a separate ObjSafeRngCore
trait.)
If you need a separate trait, then maybe make the rarely used trait the not object safe one, ala
pub trait TryRngCore {
type Error;
fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Error>;
}
I've no idea if anyone wants try_fill_bytes
on an object safe trait, but if so then one could always imagine the completely crazy "not my problem" version. lol
pub trait RngCore {
fn next_u32(&mut self) -> u32;
fn next_u64(&mut self) -> u64;
fn fill_bytes(&mut self, dest: &mut [u8]);
fn try_fill_bytes<E>(&mut self, dest: &mut [u8]) -> Result<(), E>
where E: for<'a> Into<&'a dyn ErrorTrait>;
}
Well, that's the neatest proposal I've heard yet (other than ditching all information in the Error
type), but still somewhat disruptive, and doesn't meet @newpavlov's goal of small error values due to the required fat pointer in RngCore::try_fill_bytes
(though whether that matters in this case is another question).
I doubt my second joke proposal actually works since E: for<'a> Into<&'a dyn ErrorTrait+Clone>
does not quite suffice, but maybe E: Into<dyn ErrorTrait>
once DST support improves dramatically.
I suppose associated types break trait objects right from the beginning, so you cannot simply hide the methods from the trait object?
pub trait RngCore {
fn next_u32(&mut self) -> u32;
fn next_u64(&mut self) -> u64;
fn fill_bytes(&mut self, dest: &mut [u8]);
type Error;
fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Error> where Self: Sized;
}
I suppose associated types break trait objects right from the beginning, so you cannot simply hide the methods from the trait object?
So I started experimenting with this, and apparently this RngCore
is Object Safe. Specifically:
pub trait RngCore {
fn next_u32(&mut self) -> u32;
fn next_u64(&mut self) -> u64;
fn fill_bytes(&mut self, dest: &mut [u8]);
type Error: Debug;
fn try_fill_bytes(&mut self, dest: &mut [u8]) -> Result<(), Self::Error>;
}
is Object Safe because to use a &dyn RngCore
you have to specify an Error
type.
fn annoying(rng: &dyn RngCore<Error = SomeErrorType>) { ... }
but this is a little annoying, we would like the ablity to use a common Error
type. As @dhardy noted,
impl<R: RngCore> RngCore for R where CoreErr: From<R::Error> {
type Error = CoreErr;
...
}
will not work. But we can use a wrapping struct to do the same thing:
pub struct Wrap<R>(pub R);
impl<R: RngCore> RngCore for Wrap<R> where CoreErr: From<R::Error> {
type Error = CoreErr;
...
}
Then a user can implement non-generic methods like:
fn better(rng: &dyn RngCore<Error = CoreError>) { ... }
and call them like:
let mut rng = FailRng;
annoying(&mut rng);
let mut wrap_rng = Wrap(rng);
better(&mut wrap_rng);
Here's a playground with all of the code.
Yes, by not being object safe I was implying that you can't use it without fixing the error type, thus you essentially lose all advantages of the associated type.This is why for example in RustCrypto I am using a separate DynDigest
trait to work around this issue.
Your solution will work, as will a separate object-safe trait, but to me it looks like a needless over-engineering for too little gain. Don't forget that RNG errors will be extremely rare. Do we really need such design complications? Especially considering that in future it's quite likely we will be able to improve user-facing no_std
error messages without breaking changes.
Do we really need such design complications?
I agree, this associated type stuff is probably not worth the hassle. I think that if we can get getrandom::Error
to have good messages, and have those messages stay good when it's converted to a rng_core::Error
, I think it would be sufficient.
in future it's quite likely we will be able to improve user-facing
no_std
error messages without breaking changes.
Is this just the fixes to getrandom
? Or is there a upcoming language feature that would help?
Is this just the fixes to getrandom? Or is there a upcoming language feature that would help?
It's not quite "upcoming", but looks like there is enough interest in it (e.g. see this thread). It's mentioned earlier in this thread "distributed slices" feature, see for example here for a rough draft of how it can look.
But even without this feature we can improve situation by adding a private constant array to rand_core
with descriptions for most common error codes. Maybe it's also worth to copy some error constants from getrandom
.
Yes, I similarly meant that associated types must be specified. I'm unsure however if the object safety rules could relax this if associated types never appeared in object safe methods.
I do think separating out the TryRngCore
trait above works. Or heck maybe
pub trait GetRng : Future<Output=Result<R,Error>> {
type R: RngCore:
type Error;
}
All this sounds like overkill though.
Agreed, use of an associated type here is overkill. For now, we mostly care about getrandom
errors and could potentially hard-code the error constants for a few other libraries.
We can approach this in two ways; either hard-code all error-codes and messages in rand_core
or use @josephlr's re-assignment of getrandom error-codes to detect those errors assigned to getrandom
and hook into its Debug
/Display
impls (maybe for this we want a getrandom::Error::try_from_code
function).
Note that getrandom#54 leaves 2^31 - 2^16 - 1 error codes unassigned; we may wish to reserve some additional blocks (e.g. for rand_jitter
).
Unfortunately this requires changing many error codes from the last release. Landing these changes will already require bumping the minor version on all crates.
or use @josephlr's re-assignment of getrandom error-codes to detect those errors assigned to
getrandom
and hook into itsDebug
/Display
impls (maybe for this we want agetrandom::Error::try_from_code
function).
With https://github.com/rust-random/getrandom/pull/58, getrandom
will actually be no_std
, so we could just unconditionally depend on getrandom
in rng_core
and have rng_core::Error
be a thin wrapper around getrandom::Error
(when not using the std
feature). This would make it easy to pass the Debug
/Display
implementations through.
Unfortunately this requires changing many error codes from the last release. Landing these changes will already require bumping the minor version on all crates.
I don't think so. This is why I've insisted on adding the following line to getrandom
docs:
These codes are provided for reference only and should not be matched upon (but you can match on Error constants). The codes may change in future and such change will not be considered a breaking one.
So we can change error codes and use the highest bit in error code to distinguish OS errors from custom ones.
This issue has stagnated, yet resolving it (one way or another) is an important step towards stabilisation, thus it deserves some priority.
In doing so, we should if possible minimise disruption.
No: this is very disruptive since RngCore
is no longer a concrete type, thus users must replace mentions with e.g. RngCore<Error = rand::Error>
.
Again, no; this is also disruptive (though not quite so bad). This also covers use of a companion trait like TryRngCore
.
Error
to 0.5/0.6 implAlso no: it's disruptive (except for users still on ≤0.6) and the new implementation has several improvements.
Without std
, rand_core::Error
has a single field, code: NonZeroU32
. This is well integrated with getrandom::Error
(including formatting), but less well with other error sources.
The question is with what that (a) is no_std
compatible and (b) doesn't lose information from getrandom
?
Instead of replacing the code
field, we could augment it, perhaps with msg: Option<&'static str>
. This could be optional, under an opt-in crate feature.
Does this cover our needs, especially if this is optional (allowing crates to continue using the From<NonZeroU32>
constructor)?
How do we make constructors std
-compatible — perhaps by adding the same field to the ErrorCode
struct?
If the field is feature-gated, do we keep the same constructors regardless, simply discarding the description when the field is not enabled? Anything else is going to cause annoying compile errors.
As mentioned above, we add some extra error constants like UNAVAILABLE
. This is a minimal solution, but not a very good one.
Expand the above, accepting many different error codes and descriptions from derived libs. Quite a horrible option (lots of consts but still not universal).
Since we can't rely on distributed slices yet, we could emulated these at run-time, allowing crates to register possible error codes on start-up.
Besides being very hacky, this cannot be done very well without an allocator, and is not ideal in low-memory environments.
Add a global Option<&'static str>
value to store the last error message, and setter/getter functions, to allow implementations to set the last error message before returning an error code.
Again, hacky, and also has implementation issues since accessing mutable global memory requires synchronisation primitives which aren't necessarily available.
Don't solve this: rand_lava
's users are stuck with "error 42".
Any more suggestions? From this list I think the most acceptable solution would be to augment with a feature-gated description field.
Are we sure feature gated fields in structs really always work? I suppose features always being additive makes them work correctly, yes? It sounds okay then.
Are there any Rust WGs that address no_std issues? If so, could this become their problem? I think a WG with wider involvement could do things beyond individual projects, like statically sized manual trait objects, ala
pub struct ErasedOSError {
f: fn<'a>(e: usize) -> &'a dyn Error + Debug + Display,
e: usize,
}
impl ErasedOSError {
pub fn as_dyn(&self) -> &dyn Error + Debug + Display { self.f(self.e) }
}
It's true rand could create some manual trait object like this, or some other mechanism, but we're likely better off leaving some no_std WG to do this.
Like this? We've been using feature-gated fields for a long time already.
You mean some type of pre-allocated fixed-size Box
able to store a dynamic object without an allocator? At this point anything dependent on language extensions is not a viable proposal in my opinion — language RFCs often take multiple years (the "placement new" RFC has even been unmerged), while we would like a solution in the near future.
I would prefer to not change anything in public API. Firstly, the current approach may be extended in a backwards compatible manner if we'll get distributed slices one day (well, at the cost of bumping MSRV). Secondly, I think no_std
programmers will not care about text descriptions of RNG errors that much, so probably public API complications will not pull their weight here. Also RNG implementations can use std
feature to switch between plain error codes and a richer error type*. And thirdly, I think an additional feature will make it harder and more confusing to understand API of rand_core
, situation is not perfect right now, so I don't think we should make it worse.
*I mean RNG implementations can use code like this:
// Private error type
#[cfg(feature=""std)]
struct MyRngError { .. }
fn get_data(code: NonZeroU32) -> rand_core::Error {
#[cfg(feature="std")]
rand_core::Error::new(MyRngError::from_code(code))
#[cfg(not(feature="std"))]
rand_core::Error::from(code)
}
There are tricks like the one I wrote that do not require language extension pr se, but do require some wider consensus.
I'm not suggesting to wait for some WG, but doing something simple now, with a message or two to the relevant WG so that maybe they get interested.
Lots of good reasoning @newpavlov, though I'd prefer we can find an outcome that everyone (including @nagisa) are happy with. Of course that may not be possible.
@burdges your tricks don't achieve much. Type-erasing a reference is easy; storing a type-erased object without a dynamically-sized Box
is not. Of course an implementation could store its error in static memory and return a reference, but this is also potentially problematic (thread safety, persistence). Finally, the std::error::Error
trait is (for whatever reason) not present in core
.
I'm afraid cargo feature passing has proven bug prone, which makes code like https://github.com/AltSysrq/proptest/pull/170 annoying.
We might consider methods that either move the crate feature test into rand_core, like exposing ErrorCode
, or maybe
impl Error {
#[cfg(feature = "std")]
#[inline(always)]
pub fn make<R>(_code: NonZeroU32, inner: R)
where R: std::error::Error + Send + Sync + 'static
{
Error { inner }
}
#[cfg(not(feature = "std"))]
#[inline(always)]
pub fn make<R>(code: NonZeroU32, _inner: R)
where R: Send + Sync + 'static
{
Error { code }
}
}
We should probably ignore this for now in the hopes that some relevant WG blesses some scheme, but I wanted to mention it.
We got into problems doing that kind of thing before: someone adds the std
flag, and now your error must support std::error::Error
, except the appropriate flag didn't get activated in your crate, so you get a build error (#738).
Are you saying that you prefer the https://github.com/AltSysrq/proptest/pull/170 approach so that any desyncronization of the std flag causes build failures because the build should fail whenever that happens? I do not disagree, but..
If you wanted builds not to detect the std flag missmatch, then you'd want to minimize the std check appearing in other crates, no? I'd think this requires either exposing ErrorCode
or some Error::make
like function, neither of which change the rand API.
rand_core::Error
already supports From<NonZeroU32>
and fn code(&self) -> Option<NonZeroU32>
regardless of std
mode. Is this what you want?
This has stagnated for a long time now, so I am closing with no change (see @newpavlov's last comment, which I agreed to).
The
rand_core:0.5.0::Error
is reduced down to an integer code, which has exactly as much value as no error at all. The stored integer carries no information whatsoever (esp. in generic contexts) and the error, ifDisplay
ed conveys no useful information either.Consider for example:
With that in mind, the next version should reconsider this API change.