Open HyeonuPark opened 3 months ago
I think we should use core::io::BorrowedCursor
instead of &mut [u8]
(or maybe something similar that only allows writing str
) so we can skip initializing the buffer.
I'm afraid the BorrowedCursor
's additional context gives negative impact on optimization. This is not an IO buffer - it should be declared and filled within a function with (hopefully inlined) non-IO code. It might be better to pass &mut [MaybeUninit<u8>]
instead, though.
Using MaybeUninit<u8>
instead of u8
for the buffer is desirable to keep the optimization path (mostly) "zero cost" even for dyn Display
by avoiding redundant zeroing of the stack memory[^3]. Additionally, I think making the temp stack buffer size a known constant to try_to_str
(i.e. use an array instead of slice, even if not as an API exposed constant) instead of a slice for this for-optimization API surface would be more "proper" and assist with inlining of the method (e.g. MIR inlining currently really doesn't like branches).
[^3]: Even with access to a "freeze
" operation for the stack bytes, exposing contents of "freed" stack memory to arbitrary downstream safe code, while not unsound, is still undesirable.
I personally think it's justifiable to define a specific fmt::TempBuffer
struct for usage here. It's a specific application that would benefit from a specific more targeted type API than just &mut [MaybeUninit<u8>; fmt::TEMP_BUF_SIZE]
. io::BorrowedCursor
has the extra context/overhead of tracking the init/writ subslice, which is specifically not necessary (nor desired) in this case, and could even mislead someone into thinking what's written into the temp space is relevant, rather than exclusively just the returned &str
. Specifically, it controls against giving a slice of the output buffer to try_to_str
, which is explicitly an incorrect usage (this is not a write
). Additionally, it would permit marking the buffer as known aligned to the ABI stack alignment[^2], which could improve codegen some more.
[^2]: I believe this is 4 bytes on most 32-bit targets and 16 (not 8, because SIMD) on most most 64-bit targets.
also cf. char::encode_utf8(self, &mut [u8]) -> &mut str
.
Minor naming thing: TryToStrError::capacity_hint
might also be better named as reserve_hint
, as it's not only useful for a new output with_capacity
, but also potentially for a reserve
when writing into an existing buffer (e.g. imagine a clone_from
like analog for to_string
). Additionally, if Iterator
grows the oft-discussed single-usize
alternative to size_hint
[^1], this method should mirror the name/semantics of that one.
[^1]: TL;DR: 99.9% of size_hint
consumers only use the lower bound and ignore the upper bound entirely; AIUI the conservatively correct upper bound just isn't that useful of a bound. The useful "reserve_hint
" for iterators AIUI is a cheap and conservative size prediction independent of the size bounds.
For example,
(&"&str").to_string()
doesn't take specialized route and perform dynamic dispatch.
For explicitness, a delegating specialization impl<T: ?Sized> ToString for &'_ T where T: ToString
would unfortunately be unsound, as ToString
is not a specialization-safe trait.
Namely, downstream could implement ToString
both for &'_ Ty<'_>
and Ty<'static>
(so long as Ty
is not Display
), which would make impl selection for &Ty
depend on a lifetime, i.e. be unsound (or at a bare minimum, extremely surprising[^4]).
A type bound for "captures no lifetimes" (i.e. safe to specialize) would solve a single level of additional indirection, but not additional levels. Bounding T
by Display
instead might maintain soundness (by strictly and universally subsetting the existing blanket impl), but I'm not 100% confident in asserting so. (It's times like these I wish std had had access to #[sealed]
for extension style traits pre-1.0 and had used it liberally... alas.)
[^4]: Specifically, in theory lifetime-dependent specialization "could" "just" only look at immediately evident bounds in lifetime-generic code and not actually specialize, but this mostly just defeats the purpose of specialization (cutting through layers of generality) and only even really makes any sense for strictly LSP-preserving specialization (which std limits itself to, but can't be guaranteed for downstream code, and in practice people do want to perform LSP-violating specialization).
Proposal updated. Thanks for the feedbacks!
Proposal
Problem statement
"&str".to_string()
is such a fundamental part of the Rust language. It's the second function introduced by "the book" to constructString
right after theString::new()
. And still is one of the most preferable methods to construct ownedString
for many experienced Rust developers. But the method is based on theDisplay
trait, which impliesstd::fmt
infrastructures with unavoidable dynamic dispatches and optimization blocker.. which is not. Since it's such a common function we've especially optimized them to ideal code with specialization.But specialization is not a silver bullet. Since it "specialize" certain types it doesn't compose naturally. For example,
(&"&str").to_string()
doesn't take specialized route and perform dynamic dispatch. Same applies forArc<str>
,arrayvec::ArrayString
etc. and with those also-specialized primitive types likebool
,i32
but with some indirections.Additionally, If we have general optimization infrastructure it can be reused for other functions that consume
impl Display
, likefn write_to(target: &mut String, content: impl Display)
.Motivating examples or use cases
These
.to_string()
calls invokesstd::fmt
infrastructure with dynamic dispatch. Specialization optimizes few chosen cases but the condition is subtle which can be surprising.Solution sketch
The goal of this ACP is to un-specialize
impl ToString
without affecting performance, by adding a new method to thetrait Display
. Currently handful of types are specialized withToString
trait and they can be divided into 3 groups:str
-str
,String
,Cow<'_, str>
etc.bool
which is either"true"
or"false"
u8
,char
etc.fmt::Arguments<'_>
, which tries to format itself as&str
, and fallback to default route but with preallocatedString
of estimated capacity.To address all those cases I propose to add another method to
trait Display
:Types which delegates to
str
should return reference to its internalstr
value. Types that need small buffer for formatting can use giventemp_buf
argument. Types which doesn't know its buffer size statically but still knows its capacity hint dynamically, likefmt::Argument<'_>
, can returns an error with the capacity hint. For other types this change should be no-op.Buffer capacity
To make this change really zero cost the check if the buffer provides enough space (branch in
TempBuffer::buffer()
) compares 2 constants (TEMP_BUF_CAPACITY
andREQUIRED_BUF_CAPACITY
) and the branch is expected to be optimized out.The callee know its required buffer capacity, but callers need to decide the capacity that is large enough for practical cases. For example
f64
may take up to 17 bytes to print, andSocketAddr
may take 58 bytes. I suggest to use 64 as a base value since it's small enough for small stack buffer and still large enough for most leaf types.Composite types
While the
fmt
method is designed to be composited, it's not generally recommended for composite types to overridetry_to_str
method. This method is designed for directly consuming leaf types without any additional formatter arguments like.to_string()
. Wrapper types(likeArc<T>
) may override this method to delegate to its inner type.Alternatives
We can keep the current specialization based structure. It's proven to be good enough for most use cases and we may still add another specialized impl when needed.
Technically this behavior can be implemented with 3rd party crate with its own
ToString
andDisplayExt
traits. But its.to_string()
method can't be applied to other crate's types that only implements standardDisplay
trait not the 3rd partyDisplayExt
trait.As noted earlier there's 2 main reasons to avoid default
.to_string()
impl in cricical code - 1. it's not optimal impl for leaf types, 2. dyn traits involved kills optimizations from surrounding code. The second symptom might be considered a bug, and may eventually be resolved. After that we may find that default impl is good enough for every types now.Links and related work
Edits
fixex_buf
withstruct TempBuffer
and with constant sized array buffer.capacity_hint
=>reserve_hint