rust-lang / libs-team

The home of the library team
Apache License 2.0
115 stars 18 forks source link

[ACP] IO traits in `core::io` #293

Closed jmillikin closed 2 months ago

jmillikin commented 9 months ago

Proposal

Problem statement

Now that the basic types for cursor-based IO (BorrowedBuf and BorrowedCursor) are in core::io, I'd like to propose adding a set of low-level I/O traits to core::io that mimic std::io, but use cursors into borrowed buffers and are parameterized by error type.

I expect this API is big enough that it'll need to be an RFC, but I figured I'd file an ACP first to see if there were any objections to the overall idea / API structure.

Motivating examples or use cases

Being able to write #![no_std] libraries that do some sort of I/O is a widespread request. A partial list of use cases includes:

The proposed design may also be useful in code that uses std but needs precise control over I/O patterns.

Solution sketch

// core::io

// The `IoBuf` and `IoBufMut` types are similar to `std::io::IoSlice{,Mut}`, but aren't guaranteed
// to have any particular internal layout. `IoBufMut` is also allowed to contain uninitialized excess
// capacity, making it similar to `Vec` (but with fixed maximum length).
//
// The goal is to enable fast paths for hosted targets, with slight opacity to enable `target_os = "none"`.
//
// Then within `std` there can be cheap conversions of &[IoBuf] / &[IoBufMut] <-> &[IoSlice] / &[IoSliceMut]
// which is safe because `std` knows which OS it's built for.
struct IoBuf<'a>;
struct IoBufMut<'a>; // currently named BorrowedBuf

impl<'a> From<&'a [u8]> for IoBuf<'a>;
impl<'a> From<&'a mut [u8]> for IoBufMut<'a>;
impl<'a> From<&'a mut [MaybeUninit<u8>]> for IoBufMut<'a>;
// core::io::core_io_traits (placerholder name: won't conflict with std::io and also very ugly)

// All traits in this module have a bound on IO, which associates an error type.
trait IO {
  type Error;
}

// ReadCursor tracks how much data was read from a single read* call.
// ReadVecCursor is the same, but for vectored reads.
struct ReadCursor<'a>; // currently named BorrowedCursor
struct ReadVecCursor<'a>;
impl IoBufMut<'_> {
  fn unfilled(&mut self) -> ReadCursor<'_>;
  fn unfilled_vec<'a>(bufs: &'a [&'a mut IoBufMut<'_>]) -> ReadVecCursor<'a>;
}

// WriteVecCursor tracks how much data was written in a vectored write.
//
// See https://github.com/rust-lang/rust/issues/70436 for context on why
// this extra glue is useful.
struct WriteVecCursor<'a>;
impl IoBuf<'_> {
  fn unwritten(&self) -> &[u8];
  fn unwritten_vec<'a>(bufs: &'a [&'a IoBuf<'_>]) -> WriteVecCursor<'a>;
}
// Read traits
//
// Read is like std::io::Read, except read_exact() has no default implementation due to
// the opacity of error types.
//
// ReadAt is like (a subset of) std::os::unix::fs::FileExt, and represents types that can read
// at arbitrary offsets. It has a bound on `IO` rather than `Read` because such types don't
// necessarily have the concept of a "current position" (e.g. a block device, or a ROM chip).
//
// Read sizes are provided via the cursors, so all return types are `Result<(), Self::Error>`

trait Read: IO {
  fn read(&mut self, cursor: &mut ReadCursor) -> Result<(), Self::Error>;
  fn read_exact(&mut self, cursor: &mut ReadCursor) -> Result<(), Self::Error>;
  fn read_vectored(&mut self, cursor: &mut ReadVecCursor) -> Result<(), Self::Error> { ... }
  fn read_vectored_exact(&mut self, cursor: &mut ReadVecCursor) -> Result<(), Self::Error> { ... }
  fn is_read_vectored(&self) -> bool { ... }
}

trait ReadAt: IO {
  fn read_at(&mut self, cursor: &mut ReadCursor, offset: usize) -> Result<(), Self::Error>;
  fn read_exact_at(&mut self, cursor: &mut ReadCursor, offset: usize) -> Result<(), Self::Error>;
  fn read_vectored_at(&mut self, cursor: &mut ReadVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
  fn read_vectored_exact_at(&mut self, cursor: &mut ReadVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
  fn is_read_vectored_at(&self) -> bool { ... }
}
// Write traits -- basically the same story as the read traits, but there's
// no cursor for just plain writes.

pub trait Write: IO {
  fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error>;
  fn write_all(&mut self, buf: &[u8]) -> Result<(), Self::Error>;
  fn write_vectored(&mut self, cursor: &mut WriteVecCursor) -> Result<(), Self::Error> { ... }
  fn write_all_vectored(&mut self, cursor: &mut WriteVecCursor) -> Result<(), Self::Error> { ... }
  fn is_write_vectored(&self) -> bool { ... }

  fn flush(&mut self) -> Result<(), Self::Error>;
}

pub trait WriteAt: IO {
  fn write_at(&mut self, buf: &[u8], offset: usize) -> Result<usize, Self::Error>;
  fn write_all_at(&mut self, buf: &[u8], offset: usize) -> Result<(), Self::Error>;
  fn write_vectored_at(&mut self, cursor: &mut WriteVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
  fn write_all_vectored_at(&mut self, cursor: &mut WriteVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
  fn is_write_vectored_at(&self) -> bool { ... }
}
// To provide somewhat friendlier UX, a wrapper struct can be used to provide various
// parameter formats. For example the `std::io::Read` trait methods `read` and `read_buf`
// could be represented 1:1 here, without expanding the trait's API.

pub struct Reader<'a, R>;
impl<'a, R> Reader<'a, R> {
  pub fn new(r: &'a mut R) -> Reader<'a, R>;
}

impl<R: Read> Reader<'_, R> {
  fn read(&mut self, buf: &mut [u8]) -> Result<usize, R::Error>;
  fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), R::Error>;

  fn read_buf(&mut self, buf: &mut ReadBuf) -> Result<usize, R::Error>;
  fn read_buf_exact(&mut self, buf: &mut ReadBuf) -> Result<(), R::Error>;
}

// Vectored I/O can also be made a bit more ergonomic by using const-sized
// arrays, which would make the traits non-object-safe.
impl<R: ReadVectored> Reader<'_, R> {
  fn read_vectored<const N: usize>(&mut self, bufs: [&'a mut [u8]; N]) -> Result<usize, R::Error>;
  fn read_buf_vectored(&mut self, buf: &mut ReadVecBuf) -> Result<usize, R::Error>;
  // ...
}

// same for `struct Writer<'a, W>`

Alternatives

  1. Move std::io traits into core: https://github.com/rust-lang/rust/issues/48331
    • This approach is hard-blocked by the design of std::io::Error[0], and doesn't seem likely to land in the forseeable future.
  2. Do nothing, and hope for a third party to write a library of equivalent traits that can be broadly adopted by the community.
  3. Do nothing, and encourage authors of #![no_std] libraries to define their own I/O traits (leaving consumers to write the glue code).
  4. Something smaller? Such as defining just the Read / Write traits, in the idiom of Go, and leaving the scatter/gather IO to another day.
  5. ?

[0] https://github.com/rust-lang/project-error-handling/issues/11

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

programmerjake commented 9 months ago

demo of moving std::io::Error into core: https://github.com/rust-lang/project-error-handling/issues/11#issuecomment-1761019324

clarfonthey commented 9 months ago

I understand the desire to enforce a common error for types that implement Read and Write, but honestly, I don't think that the IO trait justifies its existence. Honestly, the ability to separate out read and write errors feels like a feature, more than anything.

Assuming that we don't go the path of having a core::io::Error, I would say that we should make Error part of the Read and Write traits directly.

jmillikin commented 9 months ago

I understand the desire to enforce a common error for types that implement Read and Write, but honestly, I don't think that the IO trait justifies its existence. Honestly, the ability to separate out read and write errors feels like a feature, more than anything.

Assuming that we don't go the path of having a core::io::Error, I would say that we should make Error part of the Read and Write traits directly.

The problem with giving Read and Write their own error type is that it prevents trait composition. Consider types that can be both Read and Write, such as serial devices or network sockets.

trait Read {
    type Error;
    fn read(&mut self, buf: &mut [u8]) -> Result<usize, Self::Error>;
}
trait Write {
    type Error;
    fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error>;
}

trait Socket: Read + Write {
    // error[E0221]: ambiguous associated type `Error` in bounds of `Self`
    fn set_nodelay(&self, nodelay: bool) -> Result<(), Self::Error>;
}

// error[E0221]: ambiguous associated type `Error` in bounds of `S`
fn netboot<S: Socket>(tftp_socket: &mut S) -> Result<(), S::Error> { ... }

You end up having to give a new name to the error type at each level of trait definition, which gets out of hand quickly and causes diffs to get really unpleasant when refactoring.

trait Socket: Read<Error = Self::SocketError> + Write<Error = Self::SocketError> {
    type SocketError;
}

fn netboot_v1<S, E>(tftp_socket: &mut S) -> Result<(), E>
    where S: Read<Error = E> + Write<Error = E>,
{ ... }

// forced to change the error type for a minor refactoring
fn netboot_v2<S: Socket>(tftp_socket: &mut S) -> Result<(), S::SocketError> { ... }

In principle I would want to define the Socket trait like this, except the syntax isn't valid:

trait<E> Socket: Read<Error = E> + Write<Error = E> {}

And trying to imitate it with type parameters on the trait (Socket<E>) doesn't work -- the type checker doesn't seem to be able to recognize that all definitions of S::Error have been unified.

trait Socket<E>: Read<Error = E> + Write<Error = E> { }

/*
error[E0221]: ambiguous associated type `Error` in bounds of `S`
|     type Error;
|     ---------- ambiguous `Error` from `Read`
|     type Error;
|     ---------- ambiguous `Error` from `Write`
*/
fn netboot<E, S: Socket<E>>(tftp_socket: &mut S) -> Result<(), S::Error> { ... }
adityashah1212 commented 9 months ago

Well composition is not impossible, it just becomes annoying. You could do something like this

trait Socket: Read + Write {
    fn set_nodelay(&self, nodelay: bool) -> Result<(), <Self as Read>::Error>;
}

Or if it can return both errors, something like this

enum SocketError<R, E> {
   ReadError(R),
   WriteError(E),
}

trait Socket: Read + Write {
    fn set_nodelay(&self, nodelay: bool) -> Result<(), SocketError>;
}

Other than this point there are a couple of things, I can think of

  1. We should probably also consider std::io::Read and std::io::Write being wrapper to these trait. This way, we don't have to duplicate the implementation and opens up the possibility of sharing code between no_std and std environments
  2. We also need to decide if we want to have default implementations for methods like read_exact, read_vectored, write_exact, write_vectored and so on, like std::io::Read and std::io::Write currently do. If we want to have something like this then we have two special errors that need to be expressed as part of the above API
    1. Read/Write Interrupted, just like in case of std::io::Read and std::io::Write
    2. Ran out of memory. This may not always be an error, especially in case of this being an implementation for std::io::Write and std::io::Read, since they can easily extend the buffer
clarfonthey commented 9 months ago

Right, I think that wanting to reuse the same error for Read and Write is reasonable but forcing them to have the same error doesn't really solve the problem, since people still can inject their own errors into the mix as well. Consider the common case of read, parse, then write; unless you have some standard io::Error type, you have to make your own error for parsing, and it makes the most sense if you're using a custom Error type on the trait to simply have a ternary enum like @adityashah1212 mentioned:

pub enum MyError<R, W> {
    Read(R),
    Parse(ParseError),
    Write(W),
}

If we added (but didn't enforce) an io::Error type we could do this, using associated type bounds for brevity:

fn read_parse_write<
    R: Read<Error: Into<IoError>>,
    W: Write<Error: Into<IoError>>,
>(
    read: &mut R,
    write: &mut W,
) -> Result<(), IoError>;
jmillikin commented 9 months ago

Well composition is not impossible, it just becomes annoying. You could do something like this

trait Socket: Read + Write {
    fn set_nodelay(&self, nodelay: bool) -> Result<(), <Self as Read>::Error>;
}

This signature is incorrect, because in a design with separate error types for categories of I/O operations then Socket::set_nodelay wouldn't return a read error -- it would return a SocketConfigError or something like that.

Or if it can return both errors, something like this

I understand this works from the "it typechecks and compiles" perspective, but it's not an API you'd want to work with in real life. The low-level I/O traits don't have enough information to correctly assign semantics to specific error codes -- that requires some higher-level (usually OS-specific) context.

  1. We should probably also consider std::io::Read and std::io::Write being wrapper to these trait. This way, we don't have to duplicate the implementation and opens up the possibility of sharing code between no_std and std environments

You'd need adapter types, since otherwise there's a lot of thorny questions about trait coherency. For example std::io::Write has an impl for &mut Vec , but it would also make sense to have impl core_io_traits::Write for &mut Vec { type Error = TryReserveError; ... } -- a blanket implementation (in either direction) would prevent that.

On the plus side, wrapper structs CoreToStd<C> and StdToCore<S> are both straightforward to implement, and could live in std::io.

  1. We also need to decide if we want to have default implementations for methods like read_exact, read_vectored, write_exact, write_vectored and so on, like std::io::Read and std::io::Write currently do. If we want to have something like this then we have two special errors that need to be expressed as part of the above API

read_exact and write_all don't have default implementations in this design, because implementing them requires knowledge of the error code semantics.

The *_vectored methods do have default implementations, because they can be implemented in terms of the other methods without inspecting error conditions:

  1. Read/Write Interrupted, just like in case of std::io::Read and std::io::Write
  2. Ran out of memory. This may not always be an error, especially in case of this being an implementation for std::io::Write and std::io::Read, since they can easily extend the buffer

As written above, I think the proposed traits would sit below the layer at which semantics can be assigned to error codes. You'd need the type itself to tell you what the semantics are.

For example, you could imagine something like this:

enum IoError {
    Read(ReadError),
    Write(WriteError),
    Other(i32),
}

struct Serial { ... }
impl core_io_traits::IO for Serial { type Error = IoError; }
impl core_io_traits::Read for Serial { ... /* read() returns IoError::Read on error */ }

So the trait itself doesn't mandate the structure of the error codes, but the implementation (which is likely to be platform- and consumer-specific) would.

jmillikin commented 9 months ago

Right, I think that wanting to reuse the same error for Read and Write is reasonable but forcing them to have the same error doesn't really solve the problem, since people still can inject their own errors into the mix as well. Consider the common case of read, parse, then write; unless you have some standard io::Error type, you have to make your own error for parsing, [...]

This feels like it's a design proposal for an ACP that moves std::io::Error into core so that the implementations of Read / Write can be generic. I'm not against that idea, exactly (it does seem like a huge amount of work), but it's separate from what I'm proposing here.

Currently, there are no traits in core to represent bulk read/write to byte streams. The closest equivalents are Iterator and Extend -- you can do something like type Read<E> = Iterator<Item = Result<u8, E>>, but the performance and generated assembly is bad. There's also core::fmt::Write, but it's limited to UTF-8.

The code structure I want to enable is something like this:

// crate that implements a kernel
struct Serial;
struct IoError(i32);
impl core_io_traits::IO for Serial { type Error = IoError; }
impl core_io_traits::Read for Serial { ... }
impl core_io_traits::Write for Serial { ... }

// crate that implements SLIP
fn recv_packet<'a, R: core_io_traits::Read>(r: &mut R, buf: &'a mut [u8]) -> Result<Packet<'a>, R::Error>;
fn send_packet<W: core_io_traits::Write>(w: &mut W, packet: Packet<'_>) -> Result<(), W::Error>;

The SLIP code doesn't need to know anything about the structure of the IO errors it might encounter, it just needs to pass them through. There's a huge amount of library code that looks like this -- they need to be able to incrementally send/receive bytes, and can't do anything about errors except pass them along.

clarfonthey commented 9 months ago

For what it's worth, @programmerjake did mention potentially moving io::Error into core as an option: https://github.com/rust-lang/project-error-handling/issues/11#issuecomment-1761019324

But also, I think that some of the adjacent types, like io::ErrorKind should definitely be in core, since they're generic and don't depend on any allocations.

jmillikin commented 9 months ago

Yes, I'm aware there's broad interest in moving std::io::Error into core -- note how I linked that exact issue in the first post.

I don't think this proposal blocks or is blocked by moving std::io::Error, and even in a world where core::io::Error exists I would not want the proposed Read / Write traits to have a hard dependency on it.

adityashah1212 commented 9 months ago

You'd need adapter types, since otherwise there's a lot of thorny questions about trait coherency. For example std::io::Write has an impl for &mut Vec , but it would also make sense to have impl core_io_traits::Write for &mut Vec { type Error = TryReserveError; ... } -- a blanket implementation (in either direction) would prevent that.

@jmillikin, I haven't really looked in to deep into this, so I might be missing stuff, but what I was proposing was to have core::io::Write as the base implementation and have a blanket implementation of std::io::Write on top of core::io::Write. So as long as core::io::Write is implemented for &mut Vec, std::io::Write implementation is also present. This will pose one problem though, we won't be able to both core::io::Write and std::io::Write implemented for the same type, which may be required in some cases for optimized implementations (?). Though, I think based on what you are saying, this maybe a bit out of scope for this one.

read_exact and write_all don't have default implementations in this design, because implementing them requires knowledge of the error code semantics.

The *_vectored methods do have default implementations, because they can be implemented in terms of the other methods without inspecting error conditions:

write_vectored: call write() with the first iobuf, return the result. Still a single syscall (or interrupt, or whatever), and a partial write is permitted by the API. write_all_vectored: call write_all() on each iobuf until it returns an error. No need to inspect the error.

I understand. If this only wants to be a bare bones trait agnostic of any failures, it should be fine. Though I think, it will far from enough for most use cases and the users will still have to have super traits on top of these for them to be usable.

Though I am still with @clarfonthey on not having the IO trait. In fact, since you say this is bare bones and only implements the most generic parts of Read and Write, it further strengths his argument. Since we are not going to deal with errors on our end, we might as well completely leave it to users. In case of a device doing both Read and Write, users can choose to have different errors for each and then have a composite enum or choose to have same error both traits and then handle them according to their needs

jmillikin commented 9 months ago

@jmillikin, I haven't really looked in to deep into this, so I might be missing stuff, but what I was proposing was to have core::io::Write as the base implementation and have a blanket implementation of std::io::Write on top of core::io::Write. [...]

I understand what you're asking for, the problem is trait coherence rules and backwards compatibility.

A plausible implementation of core_io_traits::Write for Vec<u8> would be:

impl core_io_traits::Write for Vec<u8> {
    type Error = crate::collections::TryReserveError;
    fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error> {
        self.try_reserve(buf.len())?;
        self.extend_from_slice(buf);
        Ok(buf.len())
    }
}

The blanket implementation of core_io_traits::Write -> std::io::Write would have to look something like this:

impl<T, E> std::io::Write for T
where
    T: core_io_traits::Write<Error = E>,
    E: Into<std::io::Error>,
{
    fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> {
        core_io_traits::Write::write(self, buf).map_err(Into::into)
    }
}

And now you're stuck, because the existing implementation of impl std::io::Write for Vec<u8> panics if allocation fails. There's no way to maintain the current behavior and impl core_io_traits::Write for Vec<u8> and have a blanket impl of std::io::Write.

Thus, adapter types are more practical:

struct CoreToStd<C>(C);

impl<E, C> std::io::Read for CoreToStd<C>
where
    C: core_io_traits::Read<Error = E>,
    E: Into<std::io::Error>,
{
    fn read(&mut self, buf: &mut [u8]) -> Result<usize, std::io::Error> {
        core_io_traits::Read::read(&mut self.0, buf).map_err(Into::into)
    }
}

impl<E, C> std::io::Write for CoreToStd<C>
where
    C: core_io_traits::Write<Error = E>,
    E: Into<std::io::Error>,
{
    fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> {
        core_io_traits::Write::write(&mut self.0, buf).map_err(Into::into)
    }
    fn flush(&mut self) -> Result<(), std::io::Error> {
        core_io_traits::Write::flush(&mut self.0).map_err(Into::into)
    }
}

I understand. If this only wants to be a bare bones trait agnostic of any failures, it should be fine. Though I think, it will far from enough for most use cases and the users will still have to have super traits on top of these for them to be usable.

Well, that's always the drawback with primitive traits (type-classes) in a nominal type system.

The purpose of traits like core_io_traits::{Read,Write} is to name a specific functionality, so that third-party code can refer to that common name. Making those traits very generic means they can be broadly useful by providing a name to that functionality in various contexts ("bytes can be written to a Vec, a File, a SerialPort, a TwoCupsAndWetString, ..."). Then a no_std library that just needs to read/write bytes to something can be decoupled from the underlying details -- it's really useful for a compression library (for example) to be able to stream data, but it doesn't need to know what it's streaming to/from.

However, the more generic you make a trait, the less information they provide. A Linux network socket, a WASM linear memory page, and a microcontroller's infrared LED have very little in common, so a Write trait that can be useful to all of them can't really prescribe any higher-level semantics. It's just strictly "write bytes, get back opaque error".

This is different from the std::io traits, which come bundled with a huge amount of implicit context just by being part of std -- implementors can assume access to malloc, a specific target OS, semantics around EINTR, and so on.

Though I am still with @clarfonthey on not having the IO trait. In fact, since you say this is bare bones and only implements the most generic parts of Read and Write, it further strengths his argument. Since we are not going to deal with errors on our end, we might as well completely leave it to users. In case of a device doing both Read and Write, users can choose to have different errors for each and then have a composite enum or choose to have same error both traits and then handle them according to their needs

As mentioned, the problem is composition in libraries. It should be possible for a library to describe necessary I/O functionality with Read + Write or Read + Write + self::SocketOps or whatever, which is somewhere between impractical and impossible if the error types of each I/O trait are separate.

// it's really nice to not need a three-line `where` for each function that needs to
// read and write to the same IO thingie.
fn handshake<S: Read + Write>(socket: &mut S) -> Result<(), S::Error> { ... }

For cases when there are two separate I/O trait hierarchies involved (such as streaming compression), the concrete types are also necessarily separate, so the IO trait doesn't interfere:

fn copy<R: Read, W: Write>(r: &mut R, w: &mut W) -> Result<(), CopyError<R::Error, W::Error>> { ... }
enum CopyError<RE, WE> {
    ReadError(RE),
    WriteError(WE),
}