format_args! is slow - Githubissues

workingjubilee commented 4 years ago

Like molasses in the Antarctic.

As a consequence, so is any method which depends on its Arguments, like {fmt, io}::Write::write_fmt. The microbenchmarks in this issue about write!'s speed demonstrate that merely running the same arguments through format_args! and then write_fmt, even if it's just a plain string literal without any formatting required, produces a massive slowdown next to just feeding the same through fmt::Write::write_str or io::Write::write_all.

Unfortunately, write!, format!, println!, and other such macros are a common feature of fluent Rust code. Rust promises a lot of zero-cost abstractions, and on a scale from "even better than you could handwrite the asm" to "technically, booting an entire virtual machine is zero cost if you define the expression as booting a virtual machine..." this is currently "not very". Validating and formatting strings correctly can be surprisingly complex, which is going to increase with features like implicit named arguments in format_args!, so we can expect increasing speed here may be challenging. However, this should be possible, even if it might require extensive redesign.

Multiple Problems, Multiple Solutions

format_args!'s internal machinery in the Rust compiler can likely be improved.
Consumers of Arguments, such as fmt::{format, write} and {fmt, io}::Write::write_fmt, can be reviewed for runtime performance.
Macros downstream of format_args! often are invoked to do something simple that does not require extensive formatting and can use the pattern-matching feature of macro_rules! to special-case simple patterns to side-step format_args! when it's not needed. This will increase the complexity of those macros and risks breakage if done incautiously, but could be a big gain in itself.

Unfortunately some of these cases may run up against complex situations with types, trait bounds, and method resolutions, because e.g. both io::Write and fmt::Write both exist and write! needs to "serve" both. Fortunately, this is exactly the sort of thing that can benefit from the recent advances in const generics, since it's a lot of compile-time evaluation that could benefit from interacting with types (as opposed to being purely syntactic like macros), and in the future generic associated types and specialization may be able to minimize breakage from type issues as those features come online, so it's a good time to begin reviewing this code.

Related issues and PRs

75742 (and #75894)
75301 (and #75358)
52804
10761

Mark-Simulacrum commented 4 years ago

Note that the formatting infrastructure in core::fmt is intentionally not fast, as it optimizes for code size over speed. There are alternatives, e.g., https://github.com/japaric/ufmt which is smaller/faster and makes some different tradeoffs.

I don't know that a blanket issue like this is useful -- I suspect the overall API cannot change at this point, but individual improvements can be, of course, discussed in T-compiler (as this is a libs impl, not T-libs, concern).

jonas-schievink commented 4 years ago

It's code size is also notoriously poor for embedded systems fwiw

Mark-Simulacrum commented 4 years ago

It's true that it may not meet the code size goal well either - I do think we should try and go for size over speed in general, though the two are not always mutually exclusive.

workingjubilee commented 4 years ago

Are there other alternatives like ufmt primarily for embedded use? Size is cache and cache is speed, or rather the not needing it. It is probably the case that many optimizations for speed will help reduce overall size as well (and vice versa), and Arguments itself is sequestered from instantiation or introspection and versioned internally. It's not as obscured as a nameless type, but it is likely easy to change many subtle particulars about it without breaking major APIs.

Other crates of interest:

jonas-schievink commented 4 years ago

Are there other alternatives like ufmt primarily for embedded use?

We've also recently written https://github.com/knurling-rs/defmt, which does the formatting on the host instead of the device through liberal use of the forbidden arts. It is not compatible with the core::fmt syntax though, and can only be used for logging (since the device can't actually use the formatted data).

mqudsi commented 2 years ago

It's annoying that manually repeatedly writing to a Write target is actually faster than issuing a single call to both format-and-write to the destination.

There are so many issues on fmt performance that I'm not sure posting this here is going to even be seen, but I wanted to share same resources (while not being directly relevant due to the world of difference between how a managed and GC'd language like C# handles formatting vs how rust does, may still prove useful). These writeups on improvements to formatting and handling of interpolated strings in C# 10 and 11 are good reads (and contain some benchmarks that contain both cycle count and allocation numbers). There's a world of resources on this topic in the .NET GitHub repo that I've read in the past (for reasons having nothing to do with rust) that involve the trade offs of checking for and special casing constant patterns, patterns with single expressions, patterns that can be transformed directly to string concatenation, patterns joined by the same constant character (i.e. such that they can be formed by calling string.Join(..), the C# equivalent to slice::join in rust).

One cool thing C# does (that may (??) not be possible in rust without compiler magic because we don't denote interpolated strings via any special prefix the way C# uses $"this is a {var} string" with the leading $) is that string format evaluation (including the evaluation of any parameters/expressions injected into the string as parameters) can be deferred or even skipped altogether without any macro magic if the function taking the interpolated string specifies that it expects an interpolated string rather than a plain string. That allows for coalescing repeated/nested calls to fmt and lets you implement the equivalent of debug!("value: {}, calculate_something()); where evaluation of calculate_something() is skipped altogether if not running at or above a certain log level -- but without using a macro at all; instead the interpolated string and its arguments/stack are passed as-is, still unevaluated, to the called function.

Anyway, I wasn't going anywhere in particular with this but just wanted to put down these thoughts and relevant links for posterity (and for myself to be able to find in the future). There's a lot of prior art out there in the open source community with some cool ideas, even if the actual implementation and all the factors that weigh into such a decision are something very niche and specific to each language and its domain details.

rust-lang / rust

format_args! is slow #76490

Multiple Problems, Multiple Solutions

Related issues and PRs

75742 (and #75894)

75301 (and #75358)

52804

10761