rust-lang / libs-team

The home of the library team
Apache License 2.0
109 stars 18 forks source link

Add fmt::Write to io::Write adapter #133

Open SUPERCILEX opened 1 year ago

SUPERCILEX commented 1 year ago

Proposal

Problem statement

There is no easy way to use fmt::Write to write bytes to an io stream.

Motivation, use-cases

If you know the format data you'll be creating must always be valid utf8, then you should use the fmt::Write trait. Unfortunately, it is harder than necessary to then lower that data down into a byte stream.

This basically comes down to being able to interchangeably use a String buffer or a io::stdout() buffer (for example). You could argue that you should use a Vec<u8> and then convert it to a string, but now you've lost the type safety of guaranteed utf8.

Solution sketches

https://github.com/rust-lang/rust/pull/104389

The big open question is error handling, but I don't believe this needs to be addressed while the feature is unstable.

API:

struct FmtWriteAdapter<'a, W: Write + ?Sized> { ... }

impl FmtWriteAdapter {
    pub fn err(&self) -> &Option<Error>
    pub fn mut_err(&mut self) -> &mut Option<Error>
}

impl<W: Write + ?Sized> fmt::Write for FmtWriteAdapter<'_, W>

impl io::Write {
  fn fmt_adapter(&mut self) -> FmtWriteAdapter<'_, Self> where Self: Sized
}

Usage:

let mut output1 = String::new();
let mut output2 = io::stdout();

my_common_writer(&mut output1).unwrap();
my_common_writer(&mut output2.fmt_adapter()).unwrap();

fn my_common_writer(output: &mut impl fmt::Write) -> fmt::Result {
    writeln!(output, "Hello World!")
}

Links and related work

Existing issue: https://github.com/rust-lang/rust/issues/77733

Note: the other direction (i.e. writing through an io stream to a format stream) does not make sense because the data does not have to be utf8. Even it was and error handling could be taken care of, the window of data the io stream is currently viewing may not be aligned to valid utf8 data, meaning the data may actually be utf8 but the order in which the writes appeared made the data invalid.

Nilstrieb commented 1 year ago

https://github.com/rust-lang/rust/issues/77733#issuecomment-706202542 mentions that this is a breaking change, because someone could implement fmt::Write and io::Write for their type and adding the blanket impl breaks this. So I don't think this can be done.

SUPERCILEX commented 1 year ago

Hence why I went the adapter route. To be clear, this is NOT a blanket impl. From the docs in my PR:

/// #![feature(impl_fmt_write_for_io_write)]
/// # use std::{fmt, io};
/// # use std::io::FmtWriteAdapter;
///
/// let mut output1 = String::new();
/// let mut output2 = io::stdout();
/// let mut output2 = FmtWriteAdapter::from(&mut output2);
///
/// my_common_writer(&mut output1).unwrap();
/// my_common_writer(&mut output2).unwrap();
///
/// fn my_common_writer(output: &mut impl fmt::Write) -> fmt::Result {
///     writeln!(output, "Hello World!")
/// }
Nilstrieb commented 1 year ago

Oh, I misread that, oops.

the8472 commented 1 year ago

Solution sketches

Neither this issue nor the linked PR outline the actual proposed APIs.

SUPERCILEX commented 1 year ago

Right, my bad. I added a simplified version of the API. @Nilstrieb @the8472 is that clear?

the8472 commented 1 year ago

Using From might not be the most ergonomic thing. Having a defaulted method on the io::Write trait could produce this struct more ergonomically.

Nilstrieb commented 1 year ago

From<&mut io::Write> should this be a generic? But I agree that From isn't very good for this

SUPERCILEX commented 1 year ago

Having a defaulted method on the io::Write trait could produce this struct more ergonomically.

Oooh, that's a neat idea. Updated the proposed API and PR.

@Nilstrieb Yeah, it's generic but I thought that would be noise. The API now has the full definitions.

SUPERCILEX commented 1 year ago

Updated the proposed API and PR.

Oh, I should note that I couldn't figure out a way to include the fmt_adapter method on the io::Write trait directly due to object safety, hence the extra conversion trait.

the8472 commented 1 year ago

putting where Self: Sized on the method should work, you can find examples of that on Iterator.

SUPERCILEX commented 1 year ago

Wouldn't we not want that though? Otherwise I don't think you'd be able to write to an &mut [u8]. Or am I getting confused?

sfackler commented 1 year ago

If you know the format data you'll be creating must always be valid utf8, then you should use the fmt::Write trait.

Why?

SUPERCILEX commented 1 year ago

@sfackler Because then you can write to a String. The primary use case I (and presumably others) have is a more complicated version of the usage example from above. You want to write to a String because some API needs it or an &str, but sometimes you also want to write that same data to stdout or a file.

the8472 commented 1 year ago

Wouldn't we not want that though? Otherwise I don't think you'd be able to write to an &mut [u8]. Or am I getting confused?

A &mut &mut [u8] is Sized and Write.

SUPERCILEX commented 1 year ago

Mmmm, yeah you're right. But still, you wouldn't be able to use fmt_adapter in here for example: https://github.com/rust-lang/rust/blob/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/io/mod.rs#L1661. So it seems better to not add unnecessary restrictions. Also putting fmt_adapter directly on Write seems wrong because why would you ever override it? I'm gonna stick with the Ext trait.

the8472 commented 1 year ago

But still, you wouldn't be able to use fmt_adapter in here

You can take a &mut self (where self is already &mut Self) because impl<W: Write> Write for &mut W so you can always get a sized Write from an unsized one.

Also putting fmt_adapter directly on Write seems wrong because why would you ever override it?

Many of the methods don't make sense to override and they're there anyway. Eventually we might add sealed methods but this change doesn't have to wait for that.

SUPERCILEX commented 1 year ago

Are you sure? I can't get it to work:

Subject: [PATCH] Add fmt::Write to io::Write adapter

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
---
Index: library/std/src/io/mod.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/library/std/src/io/mod.rs b/library/std/src/io/mod.rs
--- a/library/std/src/io/mod.rs (revision e75efbddfe3bf7719a81a5d2d579e45d08d15ef7)
+++ b/library/std/src/io/mod.rs (date 1668459301422)
@@ -1660,7 +1660,7 @@
     /// ```
     #[stable(feature = "rust1", since = "1.0.0")]
     fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> {
-        let mut output = self.fmt_adapter();
+        let mut output = (&mut self).fmt_adapter();
         fmt::write(&mut output, fmt).map_err(|_| {
             output.error.unwrap_or(const_io_error!(ErrorKind::Uncategorized, "formatter error"))
         })
@@ -1694,19 +1694,13 @@
     {
         self
     }
-}

-/// Bridge trait to convert an [`io::Write`](Write) to a [`FmtWriteAdapter`].
-#[unstable(feature = "impl_fmt_write_for_io_write", issue = "77733")]
-pub trait FmtWriteAdapterExt<W: Write + ?Sized> {
     /// Convert an [`io::Write`](Write) to a [`FmtWriteAdapter`].
     #[unstable(feature = "impl_fmt_write_for_io_write", issue = "77733")]
-    fn fmt_adapter(&mut self) -> FmtWriteAdapter<'_, W>;
-}
-
-#[unstable(feature = "impl_fmt_write_for_io_write", issue = "77733")]
-impl<W: Write + ?Sized> FmtWriteAdapterExt<W> for W {
-    fn fmt_adapter(&mut self) -> FmtWriteAdapter<'_, W> {
+    fn fmt_adapter(&mut self) -> FmtWriteAdapter<'_, Self>
+    where
+        Self: Sized,
+    {
         FmtWriteAdapter { inner: self, error: None }
     }
 }
@@ -1718,7 +1712,7 @@
 /// ```rust
 /// #![feature(impl_fmt_write_for_io_write)]
 /// # use std::{fmt, io};
-/// # use std::io::FmtWriteAdapterExt;
+/// # use std::io::Write;
 ///
 /// let mut output1 = String::new();
 /// let mut output2 = io::stdout();
error[E0596]: cannot borrow `self` as mutable, as it is not declared as mutable
    --> library/std/src/io/mod.rs:1663:26
     |
1663 |         let mut output = (&mut self).fmt_adapter();
     |                          ^^^^^^^^^^^ cannot borrow as mutable
     |
note: the binding is already a mutable borrow
    --> library/std/src/io/mod.rs:1662:18
     |
1662 |     fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> {
     |                  ^^^^^^^^^
help: try removing `&mut` here
    --> library/std/src/io/mod.rs:1663:26
     |
1663 |         let mut output = (&mut self).fmt_adapter();
     |                          ^^^^^^^^^^^
programmerjake commented 1 year ago
1662 |     fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> {

try writing that as:

fn write_fmt(mut self: &mut Self, fmt: fmt::Arguments<'_>) -> Result<()> {
the8472 commented 1 year ago

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=71fa45fed79aa51d7f97a73a4cf8282a

SUPERCILEX commented 1 year ago

Wow, that's a mind bender, no idea you could do that. Thanks everybody!

SUPERCILEX commented 1 year ago

Another use case for this came up: if a method accepts a fmt::Writer, there's no way to use the io combinators which means you have to write your own. With this feature it should be as simple as saying &mut io::sink().fmt_adapter() and similarly if you want to use repeat, etc.

programmerjake commented 1 year ago

one other thing i noticed, this proposal needs to address error handling -- if the io::Write errors, fmt::Write will just return fmt::Error which has no data, how can you get the underlying io::Error?

SUPERCILEX commented 1 year ago

Yeah, I brushed that under the rug, but I think you're right. I'll see if I can figure it out in the next few weeks and then this should be golden.

pitaj commented 1 year ago

You could use Error::source

SUPERCILEX commented 1 year ago

Nah, that doesn't cut it because we know the underlying error is an io::Error. Basically we need a way to let you check the error and choose to restart writing, or move it out if you want to return the error.

We can have a take style method that always moves the error out, but then that permanently modifies the FmtAdapter. Not sure if we care though.

The other option is to have a method that returns a reference to the error and another method that consumes the adapter to return the error. Or maybe also a take method instead of consuming.

So maybe that's the answer? Start with just a take method and if we realize people want to examine the error first, we can add a method that returns a reference?

programmerjake commented 1 year ago

I think we'll want 2 different APIs on io::Write:

pub trait io::Write {
    /// allows E = &mut io::Result<()> to easily retrieve the error without needing to hold onto
    /// the FmtWrite:
    /// ```
    /// # use std::{io, fmt};
    /// # let mut writer = Vec::<u8>::new();
    /// let mut err: io::Result<()> = Ok(());
    /// if let Err(_) = writer.into_fmt_write(&mut err).write_str("demo") {
    ///     err?;
    /// }
    /// ```
    fn into_fmt_write<E: BorrowMut<io::Result<()>>(self, err: E) -> io::FmtWrite<Self, E>
    where
        Self: Sized,
    {
        FmtWrite::with_err(self, err)
    }
    // intentionally don't have a method like FmtWrite::new since that's prone to ignoring errors
    fn with_fmt_write<'a, F>(&'a mut self, f: F) -> io::Result<()>
    where
        F: FnOnce(&mut io::FmtWrite<&'a mut Self>) -> fmt::Result,
        Self: Sized,
    {
        let mut writer = FmtWrite::new(self);
        match f(&mut writer) {
            Err(_) => {
                writer.into_err()?;
                Err(error::const_io_error!(ErrorKind::Uncategorized, "formatter error"))
            }
            Ok(()) => writer.into_err(),
        }
    }
}

#[derive(Debug)]
pub struct io::FmtWrite<W: io::Write, E: BorrowMut<io::Result<()>> = io::Result<()>> {
    writer: W,
    err: E,
}

impl<W: io::Write, E: BorrowMut<io::Result<()>>> fmt::Write for io::FmtWrite<W, E> {
    fn write_str(&mut self, v: &str) -> fmt::Result {
        let mut err: &mut io::Result<()> = self.err.borrow_mut();
        if err.is_err() {
            return Err(fmt::Error);
        }
        *err = self.writer.write_all(v.as_bytes());
        if err.is_err() {
            Err(fmt::Error)
        } else {
            Ok(())
        }
    }
}

impl<W: io::Write, E: BorrowMut<io::Result<()>>> io::FmtWrite<W, E> {
    pub fn with_err(writer: W, err: E) -> Self {
        Self { writer, err }
    }
    pub fn writer_mut(&mut self) -> &mut W {
        &mut self.writer
    }
    pub fn err_mut(&mut self) -> &mut E {
        &mut self.err
    }
    pub fn take_err(&mut self) -> io::Result<()> {
        mem::replace(self.err.borrow_mut(), Ok(()))
    }
    pub fn into_writer_and_err(self) -> (W, E) {
        (self.writer, self.err)
    }
    pub fn into_err(self) -> E {
        self.err
    }
}

impl<W: io::Write> io::FmtWrite<W> {
    pub fn new(writer: W) -> Self {
        Self { writer, err: Ok(()) }
    }
}
SUPERCILEX commented 1 year ago

That's too complicated IMO. Here's what I was thinking of:

fn do_stuff() -> io::Result<()> {
    fn my_common_writer(output: &mut impl fmt::Write) -> fmt::Result {
        writeln!(output, "Hello World!")
    }

    let mut output1 = String::new();

    my_common_writer(&mut output1).unwrap();

    // Care about errors
    let mut io_out = io::stdout();
    let mut adapter = fmt_adapter(&mut io_out);
    if let Err(fmt::Error) = my_common_writer(&mut adapter) {
        adapter.take_err().map(Err::<(), _>).transpose()?;
        // OR inspect
        if let Some(e) = adapter.take_err() {
            // check e and maybe retry for some reason
            my_common_writer(&mut adapter).unwrap();
        }
    }

    // Don't care
    my_common_writer(&mut fmt_adapter(&mut io::stdout())).unwrap();

    Ok(())
}

pub fn fmt_adapter<W: io::Write>(inner: &mut W) -> FmtWriteAdapter<W> {
    FmtWriteAdapter { inner, error: None }
}

pub struct FmtWriteAdapter<'a, W: io::Write + ?Sized> {
    inner: &'a mut W,
    error: Option<io::Error>,
}

impl<W: io::Write + ?Sized> FmtWriteAdapter<'_, W> {
    pub fn take_err(&mut self) -> Option<io::Error> {
        self.error.take()
    }
}

I'll put it in the PR.

SUPERCILEX commented 1 year ago

Nah, even that's too complicated. Let's just do this:

impl<W: io::Write + ?Sized> FmtWriteAdapter<'_, W> {
    pub fn err(&self) -> &Option<io::Error> {
        &self.error
    }

    pub fn mut_err(&mut self) -> &mut Option<io::Error> {
        &mut self.error
    }
}

That keeps things stupid simple. Now you can do whatever you want and we don't have to worry about it.

programmerjake commented 1 year ago

Nah, even that's too complicated. Let's just do this:

That keeps things stupid simple. Now you can do whatever you want and we don't have to worry about it.

I think that is too simple, it makes it much harder to use. I think we need an API like with_fmt_write that gives you a fmt::Write and handles translating the error, rather than forcing users to write boilerplate every time.

SUPERCILEX commented 1 year ago

Do you have any specific use cases in mind? My beef with the lambda is that it will make complicated control flow messy. As for into_fmt_write, it will force boilerplate on people that just want to unwrap.

In general, I tend to prefer the most general APIs because then all of the nice specific abstractions (like with_fmt_write) can be implemented in terms of those lower level general primitives.

The problem here is that I don't know what the typical use case looks like (mine is just unwrap) and I don't want to guess.

programmerjake commented 1 year ago

and into_fmt_write is for when you need a owned fmt::Write that owns the writer.

supporting &mut .. as the error type means you don't need to keep access to the fmt::Write, greatly simplifying code.

supporting a generic error type also allows you to do things like return an owned fmt::Write and still keep a separate reference to the error for later use:

struct ErrorRef {
    err: UnsafeCell<io::Result<()>>,
}

impl ErrorRef {
    fn into_inner(self: Rc<ErrorRef>) -> Result<io::Result<()>, Rc<ErrorRef>> {
        // try_unwrap succeeding guarantees ErrorRefMut no longer exists so accessing err is safe
        Ok(Rc::try_unwrap(self)?.err.into_inner())
    }
}

/// owns mutable access to err
struct ErrorRefMut(Rc<ErrorRef>);

impl ErrorRefMut {
    fn new() -> Self {
        Self(Rc::new(ErrorRef {
            err: UnsafeCell::new(Ok(())),
        }))
    }
    // allows getting err after self is dropped
    fn get_ref(&self) -> Rc<ErrorRef> {
        self.0.clone()
    }
}

impl Borrow<io::Result<()>> for ErrorRefMut {
    fn borrow(&self) -> &io::Result<()> {
        unsafe { &*self.0.err.get() }
    }
}

impl BorrowMut<io::Result<()>> for ErrorRefMut {
    fn borrow_mut(&mut self) -> &mut io::Result<()> {
        unsafe { &mut *self.0.err.get() }
    }
}

// demo:

enum State<W: io::Write> {
    HasWriter(W),
    HasErrorRef(Rc<ErrorRef>),
}

impl<W: io::Write> State<W> {
    // returns a 'static owned type if W: 'static
    fn fmt_write(&mut self) -> FmtWrite<W, ErrorRefMut> {
        let err = ErrorRefMut::new();
        let Self::HasWriter(writer) = mem::replace(self, Self::HasErrorRef(err.get_ref())) else {
            panic!("already took writer");
        };
        writer.into_fmt_write(err)
    }
    fn into_err(self) -> io::Result<()> {
        let Self::HasErrorRef(err) = self else {
            panic!("didn't start writing");
        };
        let Ok(retval) = ErrorRef::into_inner(err) else {
            panic!("didn't finish writing");
        };
        retval
    }
}
SUPERCILEX commented 1 year ago

Is there anything more for me to do? I'd love to have this added.

SUPERCILEX commented 6 months ago

I've put out a crate that tries to solve this by generalizing over possible adapters via an extension trait. It solves this issue, https://github.com/rust-lang/rust/issues/77733, https://github.com/rust-lang/libs-team/issues/309, and https://github.com/rust-lang/libs-team/issues/278.

I'm waiting to see if there's cleaner way to define the extension trait, but the crate is basically done: https://users.rust-lang.org/t/better-way-to-handle-implementing-trait-for-generics/103724


From my research, other instances where people ran into the same issue:

maghoff commented 1 week ago

This looks great! I've just bumped into this usecase as I am designing a crate for writing ICalendar streams. The format is UTF-8, so it makes sense that the whole pipeline is structured around fmt::Write.

I tried using the io_adapters crate to give my users an easy entrypoint for writing to std::io::Write values, but I immediately bumped into ownership problems. I propose that write_adapter takes self by move instead of borrow:

#[derive(Debug)]
pub struct FmtToIo<W> {
    inner: W,
    pub error: Option<io::Error>,
}

impl<W: io::Write> fmt::Write for FmtToIo<W> {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        match self.inner.write_all(s.as_bytes()) {
            Ok(()) => {
                self.error = None;
                Ok(())
            }
            Err(e) => {
                self.error = Some(e);
                Err(fmt::Error)
            }
        }
    }
}

pub trait WriteExtension<T> {
    type Adapter;

    fn write_adapter(self) -> Self::Adapter;
}

impl<W: io::Write> WriteExtension<FmtToIo<W>> for W {
    type Adapter = FmtToIo<W>;

    fn write_adapter(self) -> FmtToIo<W> {
        FmtToIo {
            inner: self,
            error: None,
        }
    }
}

I believe this is strictly more expressive, as the user can still pass a borrowed value if that is appropriate in the context.

At any rate, the version I have pasted in this comment works in my context.