rust-lang / libs-team

The home of the library team
Apache License 2.0
115 stars 18 forks source link

[ACP] Provide an interface for creating instances of fmt::Formatter #286

Closed EliasHolzmann closed 6 months ago

EliasHolzmann commented 10 months ago

Edit 2023-10-23: Added a bit of discussion about why a new function on Formatter instead of the builder pattern is problematic

Follow-up to #280.

Proposal

Problem statement

Creating a custom Formatter at runtime as well as modifying an existing Formatter is currently not supported, one can only use the Formatters passed in by the formatting macros. Being able to create and modify them could be useful for multiple use cases:

  1. Calling the formatting code of contained fields with a modified formatting parameter
  2. Exposing different formatting methods for different representations/data points on one struct
  3. Exposing formatting methods that require additional data
  4. Writing libraries that provide enhanced formatting capabilities

Motivating examples or use cases

  1. rust-lang/rust#19207
  2. rust-lang/rust#74870
  3. rust-lang/rust#46591
  4. Multiple instances fit this use case:
    • runtime-fmt (a formatting crate that allows the user to supply the format string at runtime) seems to use the unstable fmt_internals feature to build a custom std::fmt::Arguments which has the Formatter baked in (or rather the mostly equivalent fmt::rt::Placeholder struct), see here. In consequence, this crate requires nightly Rust. (Note that the interface isn't the current one as runtime-fmt hasn't been updated since 2019)
    • rt_format (another runtime formatting crate) handles the problem in another way: Using a macro named generate_code!, it generates a function containing a format_args! call for every combination of alignment, signedness, default/alternative representation, zero/space padding, width (via variable), precision (via variable) and formatting trait for a total of 1024 format_args! invocations at the time of writing. Fill characters are not supported, as those cannot be passed via a variable to the format_args! call but must be part of the format specifier. (If you are interested to see the result of this approach, run cargo expand in the crate root and search for a huge function named format_value)
    • I'd like to experiment with a crate that reimplements the formatting macros but adds some additional features (mostly interpolation). However, it is currently impossible to do this in a manner that is compatible with the std implementation outside of nightly Rust (rt_format is almost there, but they cannot support fill characters, apart from their solution being quite hacky and probably inefficient).

Solution sketch

In std::fmt:

struct FormatterBuilder<'a>;
impl<'a> FormatterBuilder<'a> {
    /// Construct a new `FormatterBuilder` with the supplied `Write` trait object for output that is equivalent to the `{}` formatting specifier (no flags, filled with spaces, no alignment, no width, no precision).
    fn new(write: &'a mut (dyn Write + 'a)) -> Self;
    /// Constructs a new `FormatterBuilder` that is equivalent to a given `Formatter`.
    fn new_from_formatter(fmt: &'a mut Formatter<'a>) -> Self
    /// Copies all formatting properties from `other`, only the `Write` trait object is kept
    fn with_formatting_from(mut self, other: &Formatter) -> Self;

    fn sign_plus(mut self, sign_plus: bool) -> Self;
    fn sign_minus(mut self, sign_minus: bool) -> Self;
    fn alternate(mut self, alternate: bool) -> Self;
    fn sign_aware_zero_pad(mut self, sign_aware_zero_pad: bool) -> Self;
    fn fill(mut self, fill: char) -> Self;
    fn align(mut self, align: Option<Alignment>) -> Self;
    fn width(mut self, width: Option<usize>) -> Self;
    fn precision(mut self, precision: Option<usize>) -> Self;

    /// Builds the `Formatter`.
    fn build(self) -> Formatter<'a>;
}

Note that all FormatterBuilder methods take and return self by value. This is a bit unergonomic, but necessary: build must take self by value in order to transfer ownership of the Write trait object to the returned Formatter.

Alternatives

Links and related work

Relevant previous libs team discussion: https://github.com/rust-lang/rfcs/pull/3394#issuecomment-1563263573 Previous ACP with a different approach: https://github.com/rust-lang/libs-team/issues/280

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

dtolnay commented 10 months ago

I don't believe refactoring Formatter in this fashion is possible anyway – this would change the interface and therefore would be a breaking change.

FWIW the proposal that I recall seeing floated on i.r-l.o or reddit was something like pub struct Formatter<'a, W: ?Sized = dyn Write + 'a> which would be non-breaking.

dtolnay commented 10 months ago

Marking as accepted because we've already said as a team that closing https://github.com/rust-lang/rfcs/pull/3394#issuecomment-1563466426 is tantamount to accepting a Formatter creation ACP.

I am still interested in coming back to consider the with_fill-like API in https://github.com/rust-lang/rfcs/pull/3394#issuecomment-1563263573 which did not get mentioned in this ACP's Alternatives section. I wonder whether that would be more convenient more of the time for what people want in practice — &mut formatter.with_fill(self.fill) vs &mut FormatterBuilder::new_with_formatter(formatter).fill(self.fill).build().

EliasHolzmann commented 10 months ago

FWIW the proposal that I recall seeing floated on i.r-l.o or reddit was something like pub struct Formatter<'a, W: ?Sized = dyn Write + 'a> which would be non-breaking.

Thanks, I didn't see this this – new could probably be refactored in the same way, so I believe that this improvement is still possible after implementing the proposal as discussed here.


Regarding with_fill: Indeed, I seem to have forgotten to discuss that alternative here, sorry.

I believe most users want to construct a new Formatter and that modifying an existing one is more of a niche use case (I may be coloured by my own usage pattern of Rust here, though – I'm interested in more opinions on this). For constructing a new Formatter, the builder pattern feels more idiomatic than with_* methods (although that is probably a bit subjective). Also, with_* methods would result in lots of additional entries in the Formatter documentation, thereby bringing it to a size that might be a bit unergonomic.

That being said: If a majority prefers an implementation that adds with_* methods on Formatter, I also wouldn't disapprove of that (as long as there is still a new method for constructing a new Formatter).

m-ou-se commented 10 months ago

Note that all FormatterBuilder methods take and return self by value. This is a bit unergonomic, but necessary: build must take self by value in order to transfer ownership of the Write trait object to the returned Formatter.

Wouldn't it be better to take no arguments in new() and instead take the &dyn Write as an argument in build()? (Similar to OpenOptions.) Then FormatterBuilder needs no lifetime, and it can be Clone or even Copy.

m-ou-se commented 10 months ago

Small nit: This interface would allow for something that has both sign_plus and sign_minus enabled, which is impossible to do today. Perhaps this should be just sign with a new enum Sign { Plus, Minus }. (Or maybe just documentation that setting one disables the other.)

m-ou-se commented 10 months ago

How would you feel about an alternative like this?

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct FormattingOptions { … }

impl FormattingOptions {
    pub fn new() -> Self;

    pub fn sign_plus(&mut self, sign_plus: bool) -> &mut Self;
    pub fn sign_minus(&mut self, sign_minus: bool) -> &mut Self;
    pub fn zero_pad(&mut self, zero_pad: bool) -> &mut Self;
    pub fn alternate(&mut self, alternate: bool) -> &mut Self;
    pub fn fill(&mut self, fill: Option<char>) -> &mut Self;
    pub fn alignment(&mut self, alignment: Option<Alignment>) -> &mut Self;
    pub fn width(&mut self, width: Option<usize>) -> &mut Self;
    pub fn precision(&mut self, precision: Option<usize>) -> &mut Self;

    pub fn get_sign_plus(&self) -> bool;
    pub fn get_sign_minus(&self) -> bool;
    pub fn get_zero_pad(&self) -> bool;
    pub fn get_alternate(&self) -> bool;
    pub fn get_fill(&self) -> Option<char>;
    pub fn get_alignment(&self) -> Option<Alignment>;
    pub fn get_width(&self) -> Option<usize>;
    pub fn get_precision(&self) -> Option<usize>;
}

impl<'a> Formatter<'a> {
    pub fn new(write: &'a mut (dyn Write + 'a), options: FormattingOptions) -> Self;
    pub fn with_options(&mut self, options: FormattingOptions) -> Self;

    pub fn options(&self) -> FormattingOptions;
}

Now the FormattingOptions type is Copy and Clone, such that it can be stored and reused later. The setter methods all take and return &mut Self, just like for OpenOptions and Command.

This makes it very clear that a Formatter is just a &dyn Write with options attached, by giving those formatting options their own type.

scottmcm commented 10 months ago

The approach of having an ordinary and extensible options struct sounds great.

One minor thing on

pub fn options(&self) -> FormattingOptions;

Today we have https://doc.rust-lang.org/std/fs/struct.File.html#method.options, which would suggest that we should instead have

impl<'a> Formatter<'a> {
    pub fn options() -> FormattingOptions;
}

so that most people don't need to know the name of the builder type.

I'm not sure what that would mean for the getter, though...

EliasHolzmann commented 10 months ago

@m-ou-se: Thanks, your approach is much cleaner and better than mine! I only have two small nitpicks.

  1. Instead of the following:

    impl<'a> Formatter<'a> {
        pub fn new(write: &'a mut (dyn Write + 'a), options: FormattingOptions) -> Self;
        pub fn with_options(&mut self, options: FormattingOptions) -> Self;
    }

    I suggest having methods on FormattingOptions:

    impl FormattingOptions {
        pub fn build<'a>(self, write: &'a u8) -> Formatter<'a>
        pub fn build_from<'a>(self, formatter: &mut Formatter<'a>) -> Formatter<'a>
    }

    (I'm not married to the naming – if someone has something clearer than build/build_from, I'd be happy with it)

    There are two reasons why I believe this might be beneficial:

    1. Those methods are not needed by most users of Formatter and would only be bloat in the docs for them, and
    2. I believe options.build(write) is more idiomatic and easier to understand than Formatter::new(write, options) is (though I admit this is very much subjective).
  2. Regarding this:

    Perhaps this should be just sign with a new enum Sign { Plus, Minus }. (Or maybe just documentation that setting one disables the other.)

    I'm in favor of adding fn sign and enum Sign. I don't think simply mentioning this gotcha in the docs would be a good solution – these are simple setter function, so most people probably won't look at the docs, at least not until they wasted a few minutes debugging why their plus and minus signs aren't as they expected them to be.

    While we're at it, maybe we should also deprecate Formatter::sign_minus/Formatter::sign_plus in favor of a new method pub fn sign(&self) -> Sign? This would clearer communicate to the user that at most one of sign_minus and sign_plus can be true, and it is probably easier to understand match fmt.sign() { … } instead of something like if fmt.sign_plus() { … } else if fmt.sign_minus() { … } else { … }.


Regarding @scottmcm's point about File::options: While I personally find the benefit of having such helper functions dubious, I also believe that such a function should either be implemented for all builders, or for none – having them for some builders only might be confusing for a user that searches the function on Formatter. Regarding the name conflict: Maybe implementing From<&Formatter> for FormattingOptions is more idiomatic than pub fn options(&self) -> FormattingOptions anyway? This is not just a rhetorical question – implementing From for a reference seems kind of unorthodox to me, although the std lib already does so for &String (and some unsized types, but those are a special case).

m-ou-se commented 9 months ago

1.

I don't mind adding

impl FormattingOptions {
    pub fn make_formatter<'a>(self, write: &'a mut (dyn Write + 'a)) -> Formatter<'a>;
    // or `into_formatter` or `to_formatter` or whatever color you like your bikeshed
}

to allow FormattingOptions to be used directly as a builder for Formatter. (I don't think it should be called just build though, since FormatOptions can be used for other things than just building a Formatter.) But I don't think it's a good idea to remove Formatter::new, Formatter::with_options and Formatter::options from my proposed API.

Especially Formatter::options seems quite an obvious method to have: a Formatter has formatting options, so it should simply expose those.

I'm not really concerned about the point that @scottmcm brought up. A File does not carry any 'options' after it is created, so we had the opportunity to instead use that name as a shorthand to create an OpenOptions. A Formatter does carry options, so having .options() be a method to get them seems logical to me. We don't have full consistency for builders in std anyway: Command is a builder for Child (and a few other things), but we don't have Child::command() as a short-hand. File::options() was a one-off situation where we thought the convenience was big enough to be worth it, but I don't think that's a rule to be applied universally.

Having From<&Formatter> for FormattingOptions feels wrong to me. It feels a bit like having From<usize> for &[T] to get the length (instead of a len method): it's just getting a property, not a conversion.

2.

Having sign() -> Option<Sign> instead of sign_{plus,minus} on FormattingOptions (and also Formatter) sounds fine. Deprecating the existing methods requires a separate discussion that we can have after stabilizing Formatter::sign.

m-ou-se commented 9 months ago

@scottmcm Note that OpenOptions and FormatOptions would not be the same kind of thing. The former is only just a builder that normally only exists as part of a File::options().read(..).write(..).open(..) expression, while FormatOptions is a type that is also meant to be stored and used in other ways (like a regular fully public struct, but with an opaque representation).

OpenOptions just feels like an implementation detail of the builder pattern, and so it makes sense to me to make the type name irrelevant/hidden. FormatOptions however does not feel like 'just an implementation detail', but a regular type that we expect users to work with directly.

EliasHolzmann commented 9 months ago

Adding these changes to your last proposal, the interface looks like this:

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct FormattingOptions { … }

enum Sign {
    Plus, 
    Minus
}

impl FormattingOptions {
    pub fn new() -> Self;

    pub fn sign(&mut self, sign: Option<Sign>) -> &mut Self;
    pub fn zero_pad(&mut self, zero_pad: bool) -> &mut Self;
    pub fn alternate(&mut self, alternate: bool) -> &mut Self;
    pub fn fill(&mut self, fill: Option<char>) -> &mut Self;
    pub fn alignment(&mut self, alignment: Option<Alignment>) -> &mut Self;
    pub fn width(&mut self, width: Option<usize>) -> &mut Self;
    pub fn precision(&mut self, precision: Option<usize>) -> &mut Self;

    pub fn get_sign(&self) -> Option<Sign>;
    pub fn get_zero_pad(&self) -> bool;
    pub fn get_alternate(&self) -> bool;
    pub fn get_fill(&self) -> Option<char>;
    pub fn get_alignment(&self) -> Option<Alignment>;
    pub fn get_width(&self) -> Option<usize>;
    pub fn get_precision(&self) -> Option<usize>;

    pub fn create_formatter<'a>(self, write: &'a mut (dyn Write + 'a)) -> Formatter<'a>;
    pub fn modified_formatter_from<'a>(self, write: &mut Formatter<'a>) -> Formatter<'a>;
}

impl<'a> Formatter<'a> {
    pub fn new(write: &'a mut (dyn Write + 'a), options: FormattingOptions) -> Self;
    pub fn with_options(&mut self, options: FormattingOptions) -> Self;

    pub fn sign(&self) -> Option<Sign>;

    pub fn options(&self) -> FormattingOptions;
}

Anything I forgot? Or any further feedback?

m-ou-se commented 9 months ago

I think it's better to leave out modified_formatter_from. It's hard to give it a clear name, and considering Formatter already implements Write, the difference from create_formatter is very subtle.

Other than that, this looks good to me. :)

EliasHolzmann commented 9 months ago

Thanks. After thinking a bit more about this, I believe you are right – the practical use for this method is probably too small to justify the added complexity (and the difficulty in finding a good name for it). If there is indeed any real need for this, another ACP should probably be opened for that.

I'll hopefully find time this weekend to have a go at implementing this. Until then, should this ACP issue stay open? I'm not familiar with the process and the std-dev-guide isn't clear on that, but a search through past issues seems to indicate that ACPs are normally closed once accepted.

ThePuzzlemaker commented 7 months ago

I would like to see this approved. I came up with a solution almost exactly like this and posted it on the Rust Zulip, and you can see my motivation for it there. Basically, having this interface would simplify an over 40-line, repetitive match statement (which doesn't even cover all cases, and requires some weird handling for custom fill characters) into a simple 20-or-so line, intuitive segment of code which simply builds a fmt::Formatter. This would allow for much greater flexibility and having dynamic fill characters (without necessarily changing format! syntax) would be greatly useful to me (for this case of a runtime formatting library), and likely to others.