rust-lang / libs-team

The home of the library team
Apache License 2.0
115 stars 18 forks source link

ACP Implement `fmt::Write` wrapper for `&mut [u8]` (and maybe `&mut str`?) #278

Closed jmillikin closed 2 months ago

jmillikin commented 10 months ago

Proposal

Problem statement

The ability to place formatted text into a buffer seems like a useful primitive to have, so I propose adding types to core::fmt that implement Write on top of a user-provided &mut [u8]. These types would provide a safe way to access the portion of the buffer that has been written to.

Additional variants that seem useful to add at the same time:

Motivating examples or use cases

I frequently find myself writing a BufWriter { &mut [u8] } wrapper struct in no_std code when I want to do the Rust equivalent of snprintf(), for example when constructing Linux mount() data options in rust-fuse: https://github.com/jmillikin/rust-fuse/blob/47ca3bc5cda263543cf6a71155115a85dd1d094e/fuse/os/linux.rs#L539-L560.

Solution sketch

Something roughly like this -- I'm not attached to the names or the exact API, and the real implementation would leverage unsafe-unchecked methods to avoid bounds checks.

pub struct BufWriter<'a> {
    buf: &'a mut [u8],
    count: usize,
}

impl<'a> BufWriter<'a> {
    pub fn new(buf: &'a mut [u8]) -> BufWriter<'a> {
        BufWriter { buf, count: 0 }
    }

    pub fn bytes(&self) -> &[u8] {
        &self.buf[..self.count]
    }

    pub fn into_bytes(self) -> &'a [u8] {
        &self.buf[..self.count]
    }

    // maybe? I find this method useful when glomming together chunks of &CStr, but
    // I'd be fine leaving it out if libs-team dislikes it.
    //
    // pub fn write_bytes(&mut self, b: &[u8]) -> core::fmt::Result { ... }
}

impl core::fmt::Write for BufWriter<'_> {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        let b = s.as_bytes();
        let avail = &mut self.buf[self.count..];
        if b.len() > avail.len() {
            return Err(core::fmt::Error);
        }
        avail[..b.len()].copy_from_slice(b);
        self.count += b.len();
        Ok(())
    }
}

Alternatives

  1. Add a function like write_to(buf: &mut [u8], w: &impl Write) -> Result<usize, fmt::Error> to core::fmt.

    • This API feels unergonomic to me, because the user must keep track of write offsets themselves.
  2. Add a type modeled on std::io::Cursor, which implements additional methods for reading and seeking.

    • Unclear where this new type would go -- core::io::Cursor isn't possible because that would cause name conflicts with std::io::Cursor, and core::fmt would be a weird place to put a type that does more than formatting.
    • More logic to implement than writer-only wrappers, but maybe code could be relocated from std::io::Cursor ?

Links and related work

This pattern is a popular StackOverflow question:

There's another open proposal for fmt::Write that seems related:

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

m-ou-se commented 10 months ago

Does arrayvec::ArrayString work for your use cases?

I'd be interested in getting something like ArrayVec and ArrayString into core.

jmillikin commented 10 months ago

I think ArrayString might be solving a different problem, although I've not used it before so it could be that I've missed something in the docs.

First, an ArrayString seems to use a constant-sized underlying buffer (const CAP: usize). I don't see a way to construct one from a &mut [u8], so there's no way to (for example) format a value into a fixed-size slice of a larger buffer. The user would have to use an intermediate ArrayString, format into it, then do a slice-to-slice copy into the target buffer.

Second, this feature would be closer to ArrayVec<u8>, since otherwise there's no way to mix UTF8 and non-UTF8 data in the buffer. For example when working with older C APIs, sometimes the text encoding is raw byte values (ISO-8859-1) or a legacy encoding (e.g. Shift_JIS) so the flexibility of &mut [u8] is helpful.

jmillikin commented 9 months ago

I put together a draft implementation: https://github.com/rust-lang/rust/pull/117931

It's not large (~120 lines including docs), and enables formatting data into a &mut [u8] or &mut [MaybeUninit<u8>]. It works well as a replacement for format!() in no-alloc code.

ChayimFriedman2 commented 9 months ago

Can we re-use core::io::BorrowedBuf and core::io::BorrowedCursor for this?

jmillikin commented 9 months ago

Can we re-use core::io::BorrowedBuf and core::io::BorrowedCursor for this?

Implementing fmt::Write for those types also seems like a reasonable idea, but they're not a substitute for this ACP because their API doesn't provide the same affordances -- in particular there's no way to obtain the written portion of their buffer via deconstruction.

You could use a BorrowedBuf in the implementation of fmt::WriteCursor, but the implementation is already so trivial that it doesn't seem worth the effort.

ChayimFriedman2 commented 9 months ago

in particular there's no way to obtain the written portion of their buffer via deconstruction

What do you mean? We can get the written portion via BorrowedBuf::filled(). The only problem is that we cannot return a &str using from_utf8_unchecked(), since the user might write invalid UTF-8 bytes manually.

jmillikin commented 9 months ago

in particular there's no way to obtain the written portion of their buffer via deconstruction

What do you mean? We can get the written portion via BorrowedBuf::filled(). The only problem is that we cannot return a &str using from_utf8_unchecked(), since the user might write invalid UTF-8 bytes manually.

That's not deconstruction, it's a reborrow.

ChayimFriedman2 commented 9 months ago

We can add a method for deconstruction if it's a common need (if not, it can be emulated by taking the len() of the BorrowedBuf, dropping it then slicing the original data according to it, but it will require unsafe code if the data is MaybeUninit).

jmillikin commented 9 months ago

We can add a method for deconstruction if it's a common need (if not, it can be emulated by taking the len() of the BorrowedBuf, dropping it then slicing the original data according to it, but it will require unsafe code if the data is MaybeUninit).

I think the ACP covers what I'm about to say, but just to clarify: I know a BorrowedBuf (or any &mut [u8]) can be used to store the result of write!() by writing a dozen lines of glue code. The goal of this ACP is to have that glue code be part of the standard library, not ad-hoc reimplemented in each crate that needs to do alloc-free formatting.

Consider C, which has the sprintf() family of functions. It is very easy to use sprintf() to format into a buffer:

#include <assert.h>
#include <string.h>
#include <stdint.h>
#include <stdio.h>

void fmt_u16(char *buf, size_t buf_len, uint16_t v) {
    assert(buf_len >= 6); 
    sprintf(buf, "%hu", v); 
}

int main() {
    char buf[6];
    fmt_u16(buf, sizeof buf, 12345);
    printf("formatted: %s\n", buf);
    return 0;
}

The equivalent in Rust would be written like this in current stable (lifetimes explicit for clarity):

use core::fmt;
use core::mem::MaybeUninit;

fn fmt_u16<'a>(buf: &'a mut [MaybeUninit<u8>], v: u16) -> &'a str {
    struct W<'a> { buf: &'a mut [MaybeUninit<u8>], idx: usize }
    impl fmt::Write for W<'_> {
        fn write_str(&mut self, s: &str) -> fmt::Result {
            self.buf[self.idx..self.idx+s.len()].copy_from_slice(unsafe {
                core::mem::transmute(s.as_bytes())
            });
            self.idx += s.len();
            Ok(())
        }
    }
    let mut w = W { buf, idx: 0 };
    let _ = fmt::Write::write_fmt(&mut w, format_args!("{v}"));
    unsafe {
        let utf8: &[u8] = core::mem::transmute(&w.buf[..w.idx]);
        core::str::from_utf8_unchecked(utf8)
    }
}

fn main() {
    let mut buf = [MaybeUninit::uninit(); 6];
    println!("formatted: {:?}", fmt_u16(&mut buf, 12345));
}

And it would be really nice if all that extra stuff wasn't needed -- if non-allocating format look like this:

fn fmt_u16<'a>(buf: &'a mut [MaybeUninit<u8>], v: u16) -> &'a str {
    let mut cursor = fmt::WriteCursor::new_uninit(buf);
    write!(cursor, "{v}");
    cursor.into_str()
}

fn main() {
    let mut buf = [MaybeUninit::uninit(); 6];
    println!("formatted: {:?}", fmt_u16(&mut buf, 12345));
}
SUPERCILEX commented 8 months ago

I just put out a crate that tries to solve the composition problem in a generalized fashion: https://github.com/SUPERCILEX/io-adapters

Example usage that (I believe) solves this proposal's use case:

//! ```cargo
//! [dependencies]
//! io-adapters = "0.1"
//! ```
#![feature(core_io_borrowed_buf)] // Or just use a normal array if you want to keep things simple

use core::fmt;
use std::{io::BorrowedBuf, mem::MaybeUninit, str::from_utf8};

use io_adapters::WriteExtension;

fn main() {
    let mut buf = [MaybeUninit::uninit(); 128];
    let mut buf = BorrowedBuf::from(buf.as_mut_slice());

    foo(buf.unfilled().write_adapter());

    println!("{:?}", from_utf8(buf.filled()));
}

fn foo(mut output: impl fmt::Write) {
    writeln!(output, "Hello, World!").unwrap();
}