rust-osdev / uefi-rs

Rusty wrapper for the Unified Extensible Firmware Interface (UEFI). This crate makes it easy to develop Rust software that leverages safe, convenient, and performant abstractions for UEFI functionality.
https://rust-osdev.com/uefi-book
Mozilla Public License 2.0
1.27k stars 153 forks source link

RFC: Treat UEFI warnings as errors #360

Closed nicholasbishop closed 2 years ago

nicholasbishop commented 2 years ago

Summary

Remove the Completion type and change uefi::Result's type as follows:

// Current definition
pub type Result<Output = (), ErrData = ()> =
    core::result::Result<Completion<Output>, Error<ErrData>>;

// New definition
pub type Result<Output = (), ErrData = ()> =
    core::result::Result<Output, Error<ErrData>>;

Background

The UEFI spec uses the EFI_STATUS enum as the return value for most functions. A status can represent one of three things: success, an error, or a warning. The spec currently defines seven warnings. Two more warnings are defined in the platform initialization specification which should not be directly relevant to uefi-rs, and additional warnings could be implemented by OEMs, though I am not sure if any actually do.

Here are the currently-defined UEFI warnings from Appendix D:

Name Description
EFI_WARN_UNKNOWN_GLYPH The string contained one or more characters that the device could not render and were skipped.
EFI_WARN_DELETE_FAILURE The handle was closed, but the file was not deleted.
EFI_WARN_WRITE_FAILURE The handle was closed, but the data to the file was not flushed properly.
EFI_WARN_BUFFER_TOO_SMALL The resulting buffer was too small, and the data was truncated to the buffer size.
EFI_WARN_STALE_DATA The data has not been updated within the timeframe set by local policy for this type of data.
EFI_WARN_FILE_SYSTEM The resulting buffer contains UEFI-compliant file system.
EFI_WARN_RESET_REQUIRED The operation will be processed across a system reset.

Details

The current implementation of uefi::Result faithfully translates EFI_STATUS into a very explicit Rust API. It makes rigorous use of the Rust type system to allow very explicit handling of errors and warnings. However, in practice I think this API is difficult to use well. It's different enough from normal Rust error handling that I always find myself stumbling a bit when figuring out how to use it internally in uefi-rs while implementing wrappers for UEFI functions. And when using the library from an application I tend to just call log_warning()? everywhere, which means I'm not doing anything to really handle warnings.

Since uefi::Result is used pretty much anywhere, it would be great if we could simplify its usage, and I think we can do that while actually making the library more robust.

It turns out that warnings are not used very often in the spec. That means while we are paying the mental cost of handling warnings everywhere, they are not actually expected to ever occur outside of a very limited set of functions. The current API theoretically encourages applications to check at each callsite whether they care about warnings, but in reality the answer is almost always "no" simply by virtue of the fact that most functions never return warnings.

At a high level, my suggestion is that by default we should treat all warnings as errors, and only consider special handling of warnings on a case-by-case basis in functions that wrap UEFI functions where the UEFI spec explicitly mentions a warning might be returned. That means that uefi::Result::Ok can usually be treated as EFI_SUCCESS and hence the Completion type can be dropped.

Here's a breakdown of each individual error showing where it's explicitly referenced in the spec and how I think we should handle it:

  1. EFI_WARN_UNKNOWN_GLYPH is used in two places: a. EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString(): I think this is the rare case where a warning truly is just a warning. It's akin to String::from_utf8_lossy, and I think we could handle it in a similar way: provide both output_string and output_string_lossy methods for Output. b. EFI_HII_FONT_PROTOCOL.GetGlyph(): If this function returns the glyph warning it indicates that the glyph isn't known and info about the 0xFFFD unicode character has been returned instead. This would be better represented as an error.
  2. EFI_WARN_DELETE_FAILURE is used by EFI_FILE_PROTOCOL.Delete() to indicate that the file wasn't actually deleted, instead the handle was just closed. Yikes! In any normal Rust API (really any normal C API too) that case should be treated as an error.
  3. EFI_WARN_WRITE_FAILURE is not actually referenced anywhere in the spec, but from the description it sure sounds like an opportunity for accidental data loss, so it should be treated as an error.
  4. EFI_WARN_BUFFER_TOO_SMALL is confusingly not the same as EFI_BUFFER_TOO_SMALL. The warning is used by EFI_STORAGE_SECURITY_COMMAND_PROTOCOL.ReceiveData() and indicates that some data was written to the buffer, but it wasn't big enough for all of it. That should be treated as an error.
  5. EFI_WARN_STALE_DATA is used by the by the key management service protocol. I'm not very familiar with this part of the spec, but it seems to indicate that a key was used successfully but actually should be replaced. Yikes, that sure sounds like it should be treated as an error.
  6. EFI_WARN_FILE_SYSTEM is used by EFI_LOAD_FILE_PROTOCOL.LoadFile() specifically in the case of a network boot of an image containing a UEFI file system instead of a UEFI executable. This seems reasonable to treat as a warning rather than an error, so the Ok portion of the result could include a boolean or enum to indicate this situation.
  7. EFI_WARN_RESET_REQUIRED is used by SetVariable() to indicate that the secure boot setting is transitioning to a less restrictive mode and the firmware requires a reset. This is a very specific and rare case, and it would probably be reasonable just to document that this error can occur and expect that an application dealing with secure boot transitions will handle it appropriately.

Concerns

Unknown warnings

What if there are more warnings than we expect? Firmware vendors could do all sorts of weird things, so in theory an existing application that currently treats warnings as non-fatal could start failing on some devices with this change. I don't think there's much we can do to anticipate that, but I could imagine getting a vendor-specific bug or two about this.

Churn

This would be a big change to the API that will almost certainly affect every user of the library. If we make this change we should be sure to document it well, perhaps by starting to maintain a CHANGELOG.md file.

GabrielMajeri commented 2 years ago

I've read your whole proposal, and overall I agree with this change (although I also understood the desire for correctness which motivated the initial introduction of Completion). The only big issue I see is with how the downstream users of the crate will adapt - while a CHANGELOG file would be a nice start, it will still be a breaking change for almost everyone.

Furthermore, what I'm not sure of is this:

I don't think there's much we can do to anticipate that, but I could imagine getting a vendor-specific bug or two about this.

I don't know how many firmware vendors implement custom status codes, but if I recall correctly there occasionally were people who asked for escape hatches for the Result/Completion type, precisely because they were using uefi-rs with some really weird specific protocols or low-level UEFI code.

I'd love to get some feedback from the osdev community on such a change, which is why I'd advise for waiting for a few days (maybe a week) and seeing if we get any feedback? I'll pin the issue in the meantime.

nicholasbishop commented 2 years ago

(although I also understood the desire for correctness which motivated the initial introduction of Completion).

Totally agree that the introduction of Completion made sense! I don't at all think it was a bad choice, just one of those things where the tradeoffs may look different after some time in use.

The only big issue I see is with how the downstream users of the crate will adapt - while a CHANGELOG file would be a nice start, it will still be a breaking change for almost everyone.

Perhaps we could temporarily add something near the top of the readme (so it would also show up on crates.io in the next release), e.g. "Important breaking API change when upgrading from version 0.14.0: [...]".

It would be nice to have a smoother upgrade path like when something is just deprecated with a warning, but not sure that would be possible for an invasive change like this.

Furthermore, what I'm not sure of is this:

I don't think there's much we can do to anticipate that, but I could imagine getting a vendor-specific bug or two about this.

I don't know how many firmware vendors implement custom status codes, but if I recall correctly there occasionally were people who asked for escape hatches for the Result/Completion type, precisely because they were using uefi-rs with some really weird specific protocols or low-level UEFI code.

Ah interesting. I searched through issues mentioning error/warning/completion/result, but didn't spot anything specific like this. I guess this would primarily be an issue for wrapper functions that make multiple UEFI calls, since anything that just calls a single UEFI call and returns the status as a Result could presumably be handled on the application side.

I'd love to get some feedback from the osdev community on such a change, which is why I'd advise for waiting for a few days (maybe a week) and seeing if we get any feedback? I'll pin the issue in the meantime.

That sounds good, I'd be happy to wait longer than a week too, since I realize there's a good chance someone with relevant input won't happen to see this issue in a short time span. Maybe we could mention it in the next This Month in Rust OSDev to give more people a chance to see it?

IsaacWoods commented 2 years ago

I don't have time for a lot of OSDev anymore, but Completion was one of the only not-intuitive bits of uefi-rs I came across when writing my UEFI bootloader. So I'm very much in favour of this change :)

At a high level, my suggestion is that by default we should treat all warnings as errors, and only consider special handling of warnings on a case-by-case basis in functions that wrap UEFI functions where the UEFI spec explicitly mentions a warning might be returned.

This sounds like a really nice way of handling it without compromising correctness, +1

josephlr commented 2 years ago

I think this sounds great. I think most of the ergonomics questions then boil down to how ResultExt will look, and how we make it easy to the "right" thing.

Old definition:

pub trait ResultExt<Output, ErrData: Debug> {
    fn status(&self) -> Status;
    fn log_warning(self) -> core::result::Result<Output, Error<ErrData>>;
    fn unwrap_success(self) -> Output;
    fn expect_success(self, msg: &str) -> Output;
    fn expect_error(self, msg: &str) -> Error<ErrData>;
    fn map_inner<Mapped>(self, f: impl FnOnce(Output) -> Mapped) -> Result<Mapped, ErrData>;
    fn discard_errdata(self) -> Result<Output>;
    fn warning_as_error(self) -> core::result::Result<Output, Error<ErrData>>
    where
        ErrData: Default;
}

New Definition:

pub trait ResultExt<Output, ErrData> {
    fn status(&self) -> Status;
    fn log_warning(self) -> Result<Output, ErrData>
    where
        ErrData: Into<Output>;
    fn ignore_warning(self) -> Result<Output, ErrData>
    where
        ErrData: Into<Output>;
    fn discard_errdata(self) -> Result<Output>;
}

warning_as_error is now the default, so is no longer needed. map_inner, unwrap_success, expect_success, and expect_error can now just be the normal map, unwrap, expect, and expect_err methods for core::result::Result. We can also get rid of the Debug constraint.

I'm not sure how annoying the ErrData: Into<Output> constraint would be in practice. Seems fine as our usual recommendation would be "you almost certainly don't need these methods". We could add a handle_warning method if we deem it necessary.

nicholasbishop commented 2 years ago

Thanks for the feedback!

Re Result, how about this:

pub trait ResultExt<Output, ErrData> {
    fn status(&self) -> Status;
    fn discard_errdata(self) -> Result<Output>;
    fn warning(&self) -> Option<ErrData>;
}

Instead of log_warning / ignore_warning we just have warning, in the same spirit as Result::ok and Result::err. This avoids needing a ErrData: Into<Output> constraint. Hopefully it's rare anyway for an application to want to handle a warning, but if they do they can do something like:

if let Some(err_data) = result.warning() {
   // Application can decide to log, match on specific warning in `err_data.status()`, etc.
}
josephlr commented 2 years ago

w.r.t. the warning() method, would it make more sense to have a handle_warning method?

pub trait ResultExt<Output, ErrData> {
    fn handle_warning(&self, f: F) -> Result<Output, ErrData>
    where
        F: impl FnOnce(Error<ErrData>) -> Result<Output, ErrData>;
}

where if the result contains a warning f determines if we get Ok or Err in that case.

For example if we had the following function:

pub fn set_variable(data: &[u8]) -> uefi::Result

and we wanted to log but continue if we got a specific warning, we could then:

// Inside some function that returns a Result
set_variable(&data).handle_warning(|err| {
  if err.status() != Status::WARN_RESET_REQUIRED {
    return Err(err); // Propagate error
  }
  // log something about the warning
  Ok(()) // Don't return an error
})?;

and if we wanted to ignore all warnings we could:

set_variable(&data).handle_warning(|_| Ok(()))?;

basically this would be a warning-specific version of Result::or_else.

josephlr commented 2 years ago

Also, I noticed that the uefi::Error type is not public for this crate. It's public in the result module, but that module is private and only exports {Completion, Result, ResultExt, Status};.

Is this intentional or a bug (or am I missing something)?

nicholasbishop commented 2 years ago

I think handle_warning sounds like a good idea.

Re. Error, yeah I think that's an oversight. I'll put up a PR to make that public.

GabrielMajeri commented 2 years ago

Closing this since we've agreed to accept this change and it got implemented in #361.