Open kornelski opened 1 month ago
@kornelski I'm not sure if it'll help in all cases, but maybe some of rustfmt's more granular width configurations could help you out here. I'm linking to use_small_heuristics, which links to the other width options.
No, these don't help at all. I've reviewed and tested these carefully. The problem is not in choosing the right widths for various items, the problem is in the general design principle in rustfmt
that always makes it unwrap shorter lines up to the maximum setting. If set short widths, I'll have much more wrapping than I want. If I set long widths, I'll have unwanted spaghettification.
I can only move the thresholds, but I can't disable the domino effect of code crossing a threshold.
To be clear, I'm not asking to change max_width
specifically, but some option to change the maximum line length enforcement to act like in go fmt
would be great.
Thanks for clarifying. That might be easier said than done. As you've noted, rustfmt doesn't take the source layout into account when rewriting AST nodes.
rustfmt
somehow preserves empty lines, even though they're whitespace and not part of a usual AST. Perhaps in a similar way it could keep information about the original line lengths, and format AST nodes for a max_width
set to maximum line length of any of the child nodes?
The way rustfmt preserves newlines between AST nodes is actually fairly simple and doesn't require keeping track of any layout information from the source code. It does however peek back into the source to count newlines and recover comments.
It's maybe possible to provide an option that would infer the layout of nodes in certain contexts based on the source code, for example forcing a vertical layout, but I don't think we'd be able to format code exactly like go fmt
would.
There's a lot of good discussion on this thread but there's a few things I want to add.
I know this is somewhat tangential to the underlying behavior being requested, but I think important to note rustfmt has some config options (including max_width
) that by default change the values of other options unless those options are set explicitly by the user. So changing the value of max_width
is also changing the value of chain_width
which is what triggers the specific formatting change here.
Ultimately, I think the difference between the requested behavior and the current behavior is that the Style Guide currently has very explicit prescriptions based on column widths, and if an element is less than that threshold (e.g. chain_width
) then it must be single line, else it must be multiline, regardless of whatever choice the developer made about linebreaks in the input.
As a result, you'll never see something like this with rustfmt
fn example() {
let x = [1, 2, 3]
.iter()
.rev()
.enumerate()
.count();
let x = [1, 2, 3].iter().rev().enumerate().count();
}
two identical constructs will be formatted exactly the same way, every time. that's an intentional decision, and one that comes with tradeoffs that reasonable people may weigh differently.
i'll share that the Style team already has plans to review the default rules for 2027 to try to devise something that produces more consistently "better" results as we believe that column width is simpler to grok mentally and implement in tools but too often produces subpar results.
In the meantime, something the rustfmt team has been actively exploring are config-driven solutions that would add options that support devs/teams that want more manual control over when line breaks are added or removed, and which would enable the the outcome requested here.
Given the current (and unlikely to change) model, I do not envision the default formatting behavior will ever change and be gofmt, but I do completely agree that the whitespace information is available in the input and we can support non-default formatting options that utilize it.
Due to the AST-centric nature of the formatting rules and behavior, it is something we would have to do very incrementally (chains, arrays, etc.), but it is doable. We successfully implemented a POC with this for chains, but had to temporarily put it on the backburner due to the focus on the 2024 edition. It's something we'll pick back up once 2024 ships
two identical constructs will be formatted exactly the same way, every time.
This is only true in a narrow case. It's not every time. In practice, the opposite happens, and rustfmt
can force identical AST nodes to look completely different:
let x = [1, 2, 3].iter().rev().enumerate().count();
if true {
let x = [1, 2, 3]
.iter()
.rev()
.enumerate()
.count();
}
rustfmt
formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:
match val {
Enum::One(one) => take(one).and().process().it(),
Enum::Two(two) => take(two).and().process().it(),
Enum::Three(three) => take(three).and().process().it(),
Enum::Four(four) => take(four).and().process().it(),
}
Then rustfmt
makes identical match arms formatted differently:
match val {
Enum::One(one) => take(one).and().process().it(),
Enum::Two(two) => take(two).and().process().it(),
Enum::Three(three) => take(three)
.and()
.process()
.it(),
Enum::Four(four) => take(four).and().process().it(),
}
This is only true in a narrow case. It's not every time. In practice, the opposite happens, and
rustfmt
can force identical AST nodes to look completely different:let x = [1, 2, 3].iter().rev().enumerate().count(); if true { let x = [1, 2, 3] .iter() .rev() .enumerate() .count(); }
The indentation level of a node is an important factor when determining the layout since that impacts how much room rustfmt has before it needs to break lines. It might be better to say that "two identical nodes will be formatted exactly the same way, every time if they appear in the same context
".
rustfmt
formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:match val { Enum::One(one) => take(one).and().process().it(), Enum::Two(two) => take(two).and().process().it(), Enum::Three(three) => take(three).and().process().it(), Enum::Four(four) => take(four).and().process().it(), }
Then
rustfmt
makes identical match arms formatted differently:match val { Enum::One(one) => take(one).and().process().it(), Enum::Two(two) => take(two).and().process().it(), Enum::Three(three) => take(three) .and() .process() .it(), Enum::Four(four) => take(four).and().process().it(), }
Having more consistent match arm formatting has been brought up before (https://github.com/rust-lang/rustfmt/issues/3995). Probably best to keep the discussion here focused on how increasing the max_width
makes short lines longer.
I brought match
up here, because it is related. If max_width
didn't force short lines to be long, then I could set max_width=99999
, and choose myself whether I want the match arms wrapped or not.
match
, and whether structurally different AST nodes should be formatted in similar ways may be a more difficult question than whether structurally identical AST nodes should be allowed to look the same.
If I have some code that is formatted perfectly correctly for a
max_width = 50
:then the same code formatted with a longer maximum length (
max_width = 100
) always gets reformatted to stretch as widely as possible. It gets changed despite already being formatted cleanly, and not exceeding the maximum width:This makes
rustfmt
's formatting especially detrimental whenmax_width
is set to a high value (likemax_width = 150
ormax_width = 200
), because expressions that had reasonable line breaks previously are forced to become very long lines.I don't mind having occasional long lines in the code, and I would like to be able to set higher allowable maximum line width, without it also becoming a target width for making all code maximally wide.
Example use-cases:
In
if
conditions, wrapping has a relatively high cost. Indentation of the wrapped conditions is the same as indentation of a usual "then" block, so the reader has to pay close attention to the unusual position of{
. In such cases a line that is a bit longer is IMHO a lesser evil. But I don't wantrustfmt
to undo already wrappedif
conditions where I've already decided the line was too long.I have
match
statements with similar but not identical content (e.g. when mapping types in anenum
to invocations of a generic function, the calls may be very similar, but variant names and variable names differ in lengths). It's possible for some, but not all, of thematch
arms to exceed the usual limit of 80-100 chars. This makesrustfmt
format the arms inconsistently in regards to each other: some will have multi-line formatting, some will have single-line formatting. This makes it harder to compare the match arms. A value ofmax_width
that makes onematch
consistent will break consistency of another that has different range of widths.I have expressions that implement standard-defined mathematical formulas defined in external documents. I want those expressions to have line breaks in the same places as in the source document, for them to be easy to compare to the original for review.
I want to avoid trivial code changes from looking more complex in code reviews. Code is more often read more than written, and code that is developed via pull requests is also often read as a diff. For me as a maintainer, readability of diffs of the code is most important. When a PR renames a function or a variable, I want that PR to have minimal single-word diffs. Unfortunately, if the new name has a different length, it often makes lines cross the (un)wrapping threshold, and changes start to look like a complete rewrite of entire blocks/statements. Same thing happens when a line is deleted from a multi-line expression (e.g. a struct field is deleted). Instead of keeping it as an obvious 1-line-delete diff,
rustfmt
can decide to unwrap the whole multi-line block into one long line, making the diff much bigger, and obscuring what has really changed.Why not
#[rustfmt::skip]
?It's an all-or-nothing mark, rather than control over line length. I still want all the short and long lines to be correctly formatted in all other aspects, like spaces around punctuation. When a long line ends with a block, I still want the indentation of the following code in the block to be formatted well.
#[rustfmt::skip]
attribute by itself adds visual clutter. It becomes a point of contention whether an improvement in readability of one line is worth adding an eyesore in another. Usually it isn't, which makes me feel forced to live with an undesirablemax_width
behavior that always makes some code worse regardless of what value I set it to.Notably,
go fmt
does not have this problem. In Golang, lines that exceed the maximum limit are wrapped to become multi-line expressions, but expressions that already fit below the maximum line length are not forced to become one-liner spaghetti. All lines are still formatted correctly according togofmt
's rules, and this behavior is entirely deterministic and idempotent.