rust-lang / rustfmt

Format Rust code
https://rust-lang.github.io/rustfmt/
Apache License 2.0
6.04k stars 888 forks source link

max_width makes short lines longer #6338

Open kornelski opened 1 month ago

kornelski commented 1 month ago

If I have some code that is formatted perfectly correctly for a max_width = 50:

fn example() {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

then the same code formatted with a longer maximum length (max_width = 100) always gets reformatted to stretch as widely as possible. It gets changed despite already being formatted cleanly, and not exceeding the maximum width:

fn example() {
    let x = [1, 2, 3].iter().rev().enumerate().count();
}

This makes rustfmt's formatting especially detrimental when max_width is set to a high value (like max_width = 150 or max_width = 200), because expressions that had reasonable line breaks previously are forced to become very long lines.

I don't mind having occasional long lines in the code, and I would like to be able to set higher allowable maximum line width, without it also becoming a target width for making all code maximally wide.

Example use-cases:

Why not #[rustfmt::skip]?

  1. It's an all-or-nothing mark, rather than control over line length. I still want all the short and long lines to be correctly formatted in all other aspects, like spaces around punctuation. When a long line ends with a block, I still want the indentation of the following code in the block to be formatted well.

  2. #[rustfmt::skip] attribute by itself adds visual clutter. It becomes a point of contention whether an improvement in readability of one line is worth adding an eyesore in another. Usually it isn't, which makes me feel forced to live with an undesirable max_width behavior that always makes some code worse regardless of what value I set it to.

Notably, go fmt does not have this problem. In Golang, lines that exceed the maximum limit are wrapped to become multi-line expressions, but expressions that already fit below the maximum line length are not forced to become one-liner spaghetti. All lines are still formatted correctly according to gofmt's rules, and this behavior is entirely deterministic and idempotent.

ytmimi commented 1 month ago

@kornelski I'm not sure if it'll help in all cases, but maybe some of rustfmt's more granular width configurations could help you out here. I'm linking to use_small_heuristics, which links to the other width options.

kornelski commented 1 month ago

No, these don't help at all. I've reviewed and tested these carefully. The problem is not in choosing the right widths for various items, the problem is in the general design principle in rustfmt that always makes it unwrap shorter lines up to the maximum setting. If set short widths, I'll have much more wrapping than I want. If I set long widths, I'll have unwanted spaghettification.

I can only move the thresholds, but I can't disable the domino effect of code crossing a threshold.

To be clear, I'm not asking to change max_width specifically, but some option to change the maximum line length enforcement to act like in go fmt would be great.

ytmimi commented 1 month ago

Thanks for clarifying. That might be easier said than done. As you've noted, rustfmt doesn't take the source layout into account when rewriting AST nodes.

kornelski commented 1 month ago

rustfmt somehow preserves empty lines, even though they're whitespace and not part of a usual AST. Perhaps in a similar way it could keep information about the original line lengths, and format AST nodes for a max_width set to maximum line length of any of the child nodes?

ytmimi commented 1 month ago

The way rustfmt preserves newlines between AST nodes is actually fairly simple and doesn't require keeping track of any layout information from the source code. It does however peek back into the source to count newlines and recover comments.

It's maybe possible to provide an option that would infer the layout of nodes in certain contexts based on the source code, for example forcing a vertical layout, but I don't think we'd be able to format code exactly like go fmt would.

calebcartwright commented 1 month ago

There's a lot of good discussion on this thread but there's a few things I want to add.

I know this is somewhat tangential to the underlying behavior being requested, but I think important to note rustfmt has some config options (including max_width) that by default change the values of other options unless those options are set explicitly by the user. So changing the value of max_width is also changing the value of chain_width which is what triggers the specific formatting change here.

Ultimately, I think the difference between the requested behavior and the current behavior is that the Style Guide currently has very explicit prescriptions based on column widths, and if an element is less than that threshold (e.g. chain_width) then it must be single line, else it must be multiline, regardless of whatever choice the developer made about linebreaks in the input.

As a result, you'll never see something like this with rustfmt

fn example() {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();

    let x = [1, 2, 3].iter().rev().enumerate().count();
}

two identical constructs will be formatted exactly the same way, every time. that's an intentional decision, and one that comes with tradeoffs that reasonable people may weigh differently.

i'll share that the Style team already has plans to review the default rules for 2027 to try to devise something that produces more consistently "better" results as we believe that column width is simpler to grok mentally and implement in tools but too often produces subpar results.

In the meantime, something the rustfmt team has been actively exploring are config-driven solutions that would add options that support devs/teams that want more manual control over when line breaks are added or removed, and which would enable the the outcome requested here.

Given the current (and unlikely to change) model, I do not envision the default formatting behavior will ever change and be gofmt, but I do completely agree that the whitespace information is available in the input and we can support non-default formatting options that utilize it.

Due to the AST-centric nature of the formatting rules and behavior, it is something we would have to do very incrementally (chains, arrays, etc.), but it is doable. We successfully implemented a POC with this for chains, but had to temporarily put it on the backburner due to the focus on the 2024 edition. It's something we'll pick back up once 2024 ships

kornelski commented 1 month ago

two identical constructs will be formatted exactly the same way, every time.

This is only true in a narrow case. It's not every time. In practice, the opposite happens, and rustfmt can force identical AST nodes to look completely different:

let x = [1, 2, 3].iter().rev().enumerate().count();

if true {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

rustfmt formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three).and().process().it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Then rustfmt makes identical match arms formatted differently:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three)
        .and()
        .process()
        .it(),
    Enum::Four(four) => take(four).and().process().it(),
}
ytmimi commented 1 month ago

This is only true in a narrow case. It's not every time. In practice, the opposite happens, and rustfmt can force identical AST nodes to look completely different:

let x = [1, 2, 3].iter().rev().enumerate().count();

if true {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

The indentation level of a node is an important factor when determining the layout since that impacts how much room rustfmt has before it needs to break lines. It might be better to say that "two identical nodes will be formatted exactly the same way, every time if they appear in the same context".

rustfmt formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three).and().process().it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Then rustfmt makes identical match arms formatted differently:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three)
        .and()
        .process()
        .it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Having more consistent match arm formatting has been brought up before (https://github.com/rust-lang/rustfmt/issues/3995). Probably best to keep the discussion here focused on how increasing the max_width makes short lines longer.

kornelski commented 1 month ago

I brought match up here, because it is related. If max_width didn't force short lines to be long, then I could set max_width=99999, and choose myself whether I want the match arms wrapped or not.

3995 also brings other more complex causes of apparent inconsistency in match, and whether structurally different AST nodes should be formatted in similar ways may be a more difficult question than whether structurally identical AST nodes should be allowed to look the same.