Define 'short' - Githubissues

nrc commented 7 years ago

Currently Rustfmt treats certain items differently if they are 'short'. For example, it will but a short struct literal on one line:

// Short
let x = Foo { x: 42 };

// Not short
let opts = Options {
    flag_package: vec![],
    flag_jobs: None,
    flag_features: vec![],
    flag_all_features: false,
    flag_no_default_features: false,
};

As @joshtriplett points out in https://github.com/rust-lang-nursery/fmt-rfcs/issues/31#issuecomment-268343997, we should have one global definition of 'short' rather than a different definition every time.

I see two ways to define 'short' - absolute characters (e.g., 25) or proportion of max line length (e.g., 1/4). One might argue that no matter how long the line length, what makes (e.g.) a struct literal readable on one line does not change. On the other hand, people who prefer longer lines might also prefer more on one line in these cases.

nrc commented 7 years ago

Places where I think this should kick in (off the top of my head) are struct variants (discussed in #31), struct literals, and function calls. There might be other places we should consider this but currently don't. Rustfmt currently uses different values for all of these:

fn_call_width: usize, 60,
struct_lit_width: usize, 16,
struct_variant_width: usize, 35,

(from https://github.com/rust-lang-nursery/rustfmt/blob/master/src/config.rs).

That actually suggests we shouldn't have a universal value. I'd be happy for both struct values to be 25-ish, but function calls should probably be longer (we might reconsider the value, or if we need one at all given that we will almost certainly change how function calls are otherwise formatted).

petrochenkov commented 7 years ago

When formatting manually I use "short" limit == ideal_width == max_width. I remember I was surprised when I saw rustfmt splitting lines that obviously fitted into 100 characters.

~~Where is "short" usually used?~~ EDIT: Now I see the previous message which was written simultaneously with this one. I'd understand if it were used in, e.g. enum definitions to split long variants, but not in function bodies.

joshtriplett commented 7 years ago

@nrc I honestly find it surprising that rustfmt has a different length for wrapping function calls, rather than using the normal line length. Its long length compared to the other two seems like an artifact of that; I'd advocate for using the line length for function calls.

For the other two, making them both the same length seems fine, if we decide to define "short"; literals don't seem different than struct variants in that regard.

That said, I think I'd still prefer to just use the normal line length here, and only limit struct literals and struct variants by the number of fields. I'd be fine with "at most three fields before splitting onto multiple lines, even if shorter than the line length", for instance. And that rule seems much easier to manually apply than counting characters.

I'd really like to avoid the scenario where I can't trivially recreate rustfmt's style by following easy-to-manually-apply rules with a bit of editor assistance. That's one of the biggest things that bothers me about a character-count-based definition of "short".

nrc commented 7 years ago

The use case for the function args is something like this:

    foo(an_object.call1().call2().field, something.foo + 42, top_fn(nested_fn(deep), bar(), baz()));

    // vs.

    foo(an_object.call1().call2().field,
        something.foo + 42,
        top_fn(nested_fn(deep), bar(), baz()));

That line (with a single 4 space indent) fits in 100 chars, but is pretty horrible to read, I much prefer to split the args to multiple lines and would do so myself if formatting manually.

nrc commented 7 years ago

With these heuristics we get in to formatting mostly for aesthetics/subjective readability. I realise that we lose some of the appeal of a very tight spec here, but I believe there is scope for making Rustfmt a better tool by using these somewhat arbitrary heuristics. My experience has been that we closely match what people do manually by doing this.

That said, I think I'd still prefer to just use the normal line length here, and only limit struct literals and struct variants by the number of fields.

This was not enough in the past - the size of fields (and types/arguments) varies widely and people when manually formatting tend to take into account actual size as well as number of fields.

I'd really like to avoid the scenario where I can't trivially recreate rustfmt's style by following easy-to-manually-apply rules with a bit of editor assistance.

I think this is subtly but importantly wrong. I think that it is important that we have a spec that can be easily applied by both tools and real people. I think that is should be an explicit non-goal that tools and people would apply exactly the same formatting. Basically, people and machines work too differently for this to end well. Furthermore, I don't see any advantage in people and tools coming up with exactly the same formatting (obvs there are advantages to the spec being manually implementable and for a tools' formatting to be predictable/unsurprising, but I don't think that applies to character-precision). By analogy, if we were specifying a drawing tool, we would want the output of the tool to be roughly recreatable by an artist, but we would not expect the output to be a pixel-precise match to the artist's output.

In practical terms, I think it is fine for the spec to be written in terms of both hard limits and recommendations - for example, for function calls the hard limit would be 'fits in the max line length' and the recommendation is something like 'functions may be spread over multiple lines if that makes reading easier' where a person interprets that how they like, and Rustfmt interprets that as > 60 chars.

joshtriplett commented 7 years ago

@nrc I would probably split that function call as well, but not because of length. If the last argument didn't consist of a multi-argument function call itself, I'd write that on one line. As written, I'd wrap it to avoid making people skimming the line have to deal with two levels of comma-separated arguments on the same line.

Regarding arbitrary numbers: I'd like all the formatting rules to remain simple enough that people can remember and apply them. Personally, I wish it could follow the 0-1-infinity rule, and "no more than one field on a line" would produce fairly reasonable results. But even something like "no more than three fields" seems much easier to manually apply than "no more than 60 characters".

joshtriplett commented 7 years ago

On a different note, I'd prefer to define "short" as an inclusive length (for instance, a struct literal including the structure name and braces) rather than exclusive (excluding the structure name as braces):

enum E {
//  v- this length ------------------------------v
    StructVariant { field1: Type1, field2: Type2 }
//                  ^- not just this length ---^
}

nrc commented 7 years ago

I would probably split that function call as well, but not because of length

I probably would split too, however, Rustfmt can't really do that so we have to make the most of what we have.

On a different note, I'd prefer to define "short" as an inclusive length (for instance, a struct literal including the structure name and braces) rather than exclusive (excluding the structure name as braces).

Can you explain why please?

For me, the motivation here is purely to make the arguments more readable, so I wouldn't want to split a_long_object.a_very_long_method_name(x, y, z) but I would want to split x.m(a_long_argument, another_long_argument) (waving hands about the actual lengths, and actually, manually, I would not actually split the second example because the arguments are not complex).

joshtriplett commented 7 years ago

I probably would split too, however, Rustfmt can't really do that so we have to make the most of what we have.

Rustfmt could apply the heuristic I described: "split an outer function call's arguments onto one line each if one of them contains an inner function call with multiple arguments". Not perfect (println!("{}", foo(a, b)); doesn't really need wrapping), but a decent approximation of the actual human heuristic. Split lines based on complexity and ease/speed of mental parsing, not length.

Can you explain why please?

For me, the motivation here is purely to make the arguments more readable, so I wouldn't want to split a_long_object.a_very_long_method_name(x, y, z) but I would want to split x.m(a_long_argument, another_long_argument) (waving hands about the actual lengths, and actually, manually, I would not actually split the second example because the arguments are not complex).

I wouldn't split either of those, based on the heuristic I just described; when I mentioned "inclusive" for lengths, I had structs and literals in mind, not function calls. But primarily, if you have to have a magic number, it seems easier to apply "inclusive" without manual counting; for instance, if you have a struct variant in an enum, you can easily check for an N-character inclusive limit by looking at editor columns and seeing if you passed N+4. To check against an N-character exclusive limit, you'd have to have an editor that shows the length of a selection, and select the relevant length.

joshtriplett commented 7 years ago

Copying some discussion here from other issues: I think I'd really prefer to define "simple" rather than "short". For instance, in the context of an enum struct variant, it doesn't matter to me how many fields the struct has or how many characters they take up (as long as they fit on one line), but as soon as a field has a non-trivial type, an attribute, or anything other than name: SingleTokenType, I'd want to break the fields onto separate lines, no matter how few characters they fit into.

joshtriplett commented 7 years ago

Quick summary of the three alternatives discussed in today's style meeting (which we didn't decide on yet, but which we characterized as a refresher):

Width (more than N characters)
Number of tokens (more than N arguments)
Complexity of expressions (individual components more than one simple token, so a, b, c, d, e, f, g would be fine but a + b(c) would not)

gbutler69 commented 7 years ago

To me, I prefer to break things into multiple lines with appropriate indentation and alignment ANY time the complexity exceeds a certain "noisiness". This is a rather subjective idea that is determined by: Number of Arguments/Well-Defined Potential Split Points, Overall Length if not split, Longest Length of segments if split at well-defined potential split points.

Generally speaking, if there are 3 or more well-defined potential splits OR 2 or more potential splits and at least one of them is a significant fraction of the preferred remaining line length (after deducting already indented amount), OR the overall line length is greater than 80% of the preferred line length and there is at least 1 well-defined potential split point - In all of these cases, I would seek to split and indent/align along all the potential split points. Each split segment, would then have similar evaluation done on it for the chance for itself to also be split to further lines using similar rules.

What do I mean by potential well-defined split points? Commas between arguments of method calls, method definitions, struct elements, enum elements, array elements in initializers, "." between method invocations in chained method invocations, etc.

To clarify further, the "informal" algorithm I always use to format code into multiple lines is:

Are there any well-defined split points available? If so, how many? If not, STOP (no reformatting).
Are there 3 or more well-defined split points? If yes, split the line at those split points. Proceed back to 1 for each split line and evaluate the rules for the next contextual level.
Are there 2 or more well-defined split points AND is at least one of the split segments (with indentation) at least 40% of the desired line length? Yes, split it and proceed back to 1 for each split line.
Is the line greater than 80% of the desired line length (and at least one available split point)? Yes, split it and proceed back to 1 for each split line.

This algorithm (with some tweaking regarding the exact %'s to use and the exact cut-offs for number of splits available for various contexts) can be used once you define "potential well-defined split points". I generally use the following (in order of precedence):

struct, class, enum, etc members in a declaration or initialization block
arguments (commas) in method calls or method declarations
method invocation or de-reference operator (. or ->) in chained method invocations/de-refs etc.
before first expression of a brace-delimited lambda body (not before a bare expression though)
before binary/ternary operator in expressions
before = in assignment statement
array elements in array initializers
before first argument in method invocation/declaration or before first expression in a block or parenthetical expression (Only if the next level of context down does not have useful split points)

NOTE: I believe this last point gets to the heart of this proposal.

Splits should always be attempted at the top-level context first and then subsequent evaluation of already split lines should follow the same rules for their level of context. However, if there is no meaningful split at the top level of context, and rule 4 would otherwise apply, then attempt splits

This results in things like a method call that takes 3 arguments, where the second argument is a bare lambda expression, and the first argument is a simple variable, and the third argument in a call to a constructor with several arguments, and the 4th argument is a call to a chain of methods to be split like this:

foo( bar, |a| a + 1, someObject::new( bip, flam, floozy ), dummy.call().a().longish().chain().of().methods() );

Would become:

foo( bar,
     |a| a +1,     
     someObject::new( bip, flam, floozy ),
     dummy.call().a().longish().chain().of().methods() );

By Rule 2. Then, by rule 2 again:

foo( bar,
     |a| a +1,
     someObject::new( bip, flam, floozy ),
     dummy.call()
           .a()
           .longish()
           .chain()
           .of()
           .methods() );

Then (assuming or target line length is about 40 (which it wouldn't be), by Rule #4:

foo( bar,
     |a| a +1,
     someObject::new( bip, 
                      flam, 
                      floozy ),
     dummy.call()
          .a()
          .longish()
          .chain()
          .of()
          .methods() );

To make this algorithm work universally and consistently, you only need to define the "well-defined split points" for various contexts and a hierarchy of which split points should be preferred first at a given contextual level (e.g. prefer splitting after , on method arguments long before you split after the opening parenthesis in the list of arguments). It can also be helpful to have "desired line lengths" defined for various contexts as well if a single desired line length seems insufficient.

Then, all splitting can be determined by:

What's the top-level context for this line I'm thinking of splitting?
For that context what are the "split points" and their precedence
Apply the rules above 1 through 4
Repeat on each split out line

This requires no ad-hoc rules and results in extremely consistent and readable code that sometimes might differ from what someone would do manually, but, not in a way that makes it "less" readable. Though you can always invent exceptions, they generally don't make the code more readable, just differently so.

joshtriplett commented 7 years ago

We just discussed this in the rust-style meeting, and I think we're very close to a consensus. Here's some summaries of the current state of the discussion:

We're currently entertaining two major possibilities.

Possibility 1: a width metric:

if what's between the braces/parens is longer than N characters, for some N, break each argument/param to its own line.

Possibility 2: a simplicity-based metric. Current straw proposal:

No more than 4 arguments or 3 name: value fields
Each field is "simple": no more than a single identifier or literal constant, with unary operators (&, *) and numeric suffixes (u32)

In evaluating these, there are four interesting cases to consider:

short and simple
long and complex
long and simple
short and complex

Both of these proposals handle "short and simple" and "long and complex" just fine. They differ on "long and simple" or "short and complex":

A width-based metric would break "long and simple" cases across lines, such as a handful of simple field: value pairs, where the field names and values used long, descriptive identifiers.
A width-based metric would leave "short and complex" cases on one line. That could allow for quite complex expressions as long as they fit in the line limit; we managed to come up with some uncomfortably complicated expressions that fit inside 40-50 characters, which we felt like ought to be broken across lines.
A simplicity-based metric would leave "long and simple" cases alone, even if things like descriptive identifiers pushed them over the value of N.
A simplicity-based metric would break "short and complex" cases across multiple lines.

We had a general consensus that we shouldn't build a set of rules that included both complexity and a numeric width limit; that made the rules too hard to explain and evaluate in practice. We should pick one or the other.

Note: the selection of "4 arguments" or "2-3 fields" was based in part on things like (x, y, z, w) vectors, as used in graphics. We originally said "4 arguments/fields", but we felt that name: value was more complex than value alone.

We'd like to make sure the "simplicity" rules are as simple as possible. The straw proposal above seems like it might work; if it grew multiple additional rules or corner cases, that'd make it less appealing.

We'd like to evaluate both of the proposals above (numeric width, or simplicity), and figure out which one produces more satisfying results.

Nemo157 commented 7 years ago

but we felt that name: value was more complex than value alone.

So, would that be affected by struct field shorthand?

let (foo, bar) = ...;
Foo { foo, bar, baz: 65 }

I could see an argument that a shorthand field is only worth 0.5 of a full struct field for the simplicity-metric.

joshtriplett commented 7 years ago

@Nemo157 Yes, you could have up to 4 (simple) fields if you're using the struct field shorthand.

Thanks for pointing out that you can mix the two, though. I think the rule I'd suggest is "up to 4 arguments or shorthand fields, or up to 3 fields if any of them are name: value pairs".

joshtriplett commented 7 years ago

In the interests of making some forward progress here: would someone be willing to try implementing the proposed strategies in rustfmt and posting some sample code formatted with it, so we can see how the results look on real code?

nrc commented 7 years ago

would someone be willing to try implementing the proposed strategies in rustfmt

I will do so, but won't have time in the next week or so. Not sure if it is something that @topecongiro is interested in working on?

topecongiro commented 7 years ago

@nrc I will work on it sometime this week.

nrc commented 7 years ago

I will work on it sometime this week.

Thank you!

topecongiro commented 7 years ago

Three different styles (they only apply to function calls and alike for now) and examples (diffs between each style and width-based strategy in rustfmt code base) for comparison.

No short rule: allow single line as long as it does not exceed max width (examples).
Simplicity-based: allow single line when there are no more than 4 arguments (examples).
Mixed: allow single line when width-based or simplicity-based rules hold (examples).

joshtriplett commented 7 years ago

@topecongiro Thank you very much for working on this!

Minor nit: the handling of trailing commas needs some work. In a couple of cases, lines contained a trailing comma followed by a close paren. Search for ,) to find examples. Conversely, in the simplicity case, several lines that should have trailing commas (the last argument of a function, for instance) don't.

Much more significant issue: the rule isn't "no more than 4 arguments", it's "no more than 4 simple arguments", for the definition of simplicity mentioned in https://github.com/rust-lang-nursery/fmt-rfcs/issues/47#issuecomment-314277255 :

No more than 4 arguments (or shorthand fields) or 3 name: value fields
Each field is "simple": no more than a single identifier or literal constant, with unary operators (&, *) and numeric suffixes (u32)

(It also looks like this got applied to match patterns, and for that case, I think I'd also add ref and ref mut to the permitted unary operators.)

Finally, I see several cases under the Simplicity examples that meet the criteria but got broken across lines. For instance:

-        rewrite_struct_field(context, self, shape, prefix_max_width)
+        rewrite_struct_field(
+            context,
+            self,
+            shape,
+            prefix_max_width,
+        )

Why did this get rewritten? It should use the one-line form.

All that aside, the examples you provided still provide a great deal of help in evaluating concrete cases. I'm going to post a follow-up comment discussing specific examples and the desirability of their formatting. And I wouldn't suggest revising further until we talk through some of the examples more, because the examples we have already do seem to show some cases where none of the rules we've discussed produce desirable results.

joshtriplett commented 7 years ago

Some concrete examples quoted from the various examples @topecongiro linked above:

-                let rewrite = self.rewrite_required_fn(indent, ti.ident, sig, ti.span);
+                let rewrite = self.rewrite_required_fn(
+                    indent,
+                    ti.ident,
+                    sig,
+                    ti.span,
+                );

This seems wrong. I'd write that on one line. token.token doesn't seem excessively complex.

-        rewrite_comment(trimmed, false, shape, context.config).map(|s| {
-            format!("{}\n{}", s, shape.indent.to_string(context.config))
-        })
+        rewrite_comment(trimmed, false, shape, context.config)
+            .map(|s| format!("{}\n{}", s, shape.indent.to_string(context.config)))

The combined opening should have taken precedence here; the first version looks much better to me.

-                    visit::FnKind::Method(ii.ident, sig, Some(&ii.vis), body),
+                    visit::FnKind::Method(
+                        ii.ident,
+                        sig,
+                        Some(&ii.vis),
+                        body,
+                    ),

I'd write this on one line as well. Even if all four arguments looked like Some(&ii.vis).

-        let rewrite = rewrite_macro(mac, ident, &self.get_context(), shape, pos);
+        let rewrite = rewrite_macro(
+            mac,
+            ident,
+            &self.get_context(),
+            shape,
+            pos,
+        );

I'd write this on one line. Yes, even though it has five arguments, and even though one of those arguments is &self.get_context().

-        try_opt!(self.meta()).rewrite(context, shape).map(
-            |rw| if rw.starts_with("///") {
+        try_opt!(self.meta()).rewrite(context, shape).map(|rw| {
+            if rw.starts_with("///") {

This is a major improvement, I think; the closure stands out much more this way. This rewrite is good.

+                visit::FnKind::ItemFn(
+                    item.ident,
+                    generics,
+                    unsafety,
+                    constness,
+                    abi,
+                    &item.vis,
+                    body,
+                ),

Honestly, I'd write this on one line, too. Splitting it across lines does not improve readability.

-        ast::VariantData::Tuple(ref fields, _) => format_tuple_struct(
-            context,
-            item_name,
-            ident,
-            vis,
-            fields,
-            generics,
-            span,
-            offset,
-        ),
+        ast::VariantData::Tuple(ref fields, _) => {
+            format_tuple_struct(context, item_name, ident, vis, fields, generics, span, offset)
+        }

This is an improvement. I agree with formatting the format_tuple_struct call on one line here.

joshtriplett commented 7 years ago

From the "mixed" examples:

-        WriteMode::Diff => {
-            if let Ok((ori, fmt)) = source_and_formatted_text(text, filename, config) {
-                let mismatch = make_diff(&ori, &fmt, 3);
-                let has_diff = !mismatch.is_empty();
-                print_diff(mismatch, |line_num| {
-                    format!("Diff in {} at line {}:", filename, line_num)
-                });
-                return Ok(has_diff);
-            }
-        }
+        WriteMode::Diff => if let Ok((ori, fmt)) =
+            source_and_formatted_text(text, filename, config)
+        {
+            let mismatch = make_diff(&ori, &fmt, 3);
+            let has_diff = !mismatch.is_empty();
+            print_diff(mismatch, |line_num| format!("Diff in {} at line {}:", filename, line_num));
+            return Ok(has_diff);
+        },

This seems very wrong. Breaking in the middle of the if let is much worse than breaking before it and leaving the whole if let on one line. It should not have been combined. What rule allowed this?

From the "none" examples:

     let mut rewrites = try_opt!(
-        iter.map(|(e, shape)| {
-            rewrite_chain_subexpr(e, total_span, context, shape)
-        }).collect::<Option<Vec<_>>>()
+        iter.map(|(e, shape)| { rewrite_chain_subexpr(e, total_span, context, shape) })
+            .collect::<Option<Vec<_>>>()
     );

The old way looks better here. In practice I'd probably write this as:

     let mut rewrites = try_opt!(iter.map(|(e, shape)| {
             rewrite_chain_subexpr(e, total_span, context, shape)
     }).collect::<Option<Vec<_>>>());

But I don't think that generalizes, because the trailing collect seems problematic if you stack a pile of other closings on it. So I can live without that if we can't come with a general rule that allows it. (Also, does replacing try_opt! with trailing ? make that better?)

nrc commented 7 years ago

I mostly agree with those observations. Some comments

This seems wrong. I'd write that on one line. token.token doesn't seem excessively complex.

One thing we didn't get to last week was whether 'short' means the same thing in all some circumstances. Currently it does not. I think that makes sense - people have a higher tolerance for single-line complexity in function calls than struct literals, I think.

So, we could probably allow the complexity measure to be more complex in function calls (I have to say though that this level of complexity in the spec is making me really uncomfortable).

This is an improvement. I agree with formatting the format_tuple_struct call on one line here.

I agree, but I don't see how this comes out of our rules, it looks like a lot (>4) simple args.

joshtriplett commented 7 years ago

I feel like there's a very clear general rule we're missing here that would fit all the above examples I quoted. Anyone up for trying to help articulate it?

I think it has something to do with "spreading complexity across lines" rather than packing complexity into one line. Sometimes we have multiple potential break-points on one line. The "combining openings and closings" rules are an effort to decide how to handle that. So are these rules for splitting arguments or fields across lines. So the question is, how do we handle multiple splits? That's separate from "how do we decide whether to split", but I feel like the answers may be closely related. In both cases, we're trying to spread out complexity.

joshtriplett commented 7 years ago

@nrc Well, we could reasonably ask whether we want to break function calls that don't exceed the line length limit. That's separate from "if we're breaking it, where and how do we break, if we have multiple choices".

I do agree that we tend to tolerate less complexity in struct literals, including struct literals inside other expressions.

topecongiro commented 7 years ago

@joshtriplett

Minor nit: the handling of trailing commas needs some work. In a couple of cases, lines contained a trailing comma followed by a close paren. Search for ,) to find examples. Conversely, in the simplicity case, several lines that should have trailing commas (the last argument of a function, for instance) don't.

This is because rustfmt preserves the presence of trailing comma inside macro since adding or removing it might cause the code not to compile.

Much more significant issue: the rule isn't "no more than 4 arguments", it's "no more than 4 simple arguments", for the definition of simplicity mentioned in #47 (comment) :

Just in case you are interested, rustfmt currenlty treats arguments without newlines as 'simple'.

Finally, I see several cases under the Simplicity examples that meet the criteria but got broken across lines. For instance:
-        rewrite_struct_field(context, self, shape, prefix_max_width)
+        rewrite_struct_field(
+            context,
+            self,
+            shape,
+            prefix_max_width,
+        ) 
Why did this get rewritten? It should use the one-line form.

This is caused by a typo 😞. args.len() > 3 should be args.len() > 4.

    let tactic = match context.config.fn_call_one_line_style() {
        OneLineStyle::Simplicity if args.len() > 3 => DefinitiveListTactic::Vertical,
        _ => definitive_tactic_for_args,
    };

https://github.com/topecongiro/rustfmt/commit/6cdac6471570c82e25adfca3b27b0d6d7cf544dc Examples for simplicity-based strategy with a typo fixed.

nrc commented 7 years ago

Well, we could reasonably ask whether we want to break function calls that don't exceed the line length limit. That's separate from "if we're breaking it, where and how do we break, if we have multiple choices".

Indeed. I think we have three choices:

don't break function calls early
break function calls early with the same criteria as struct lits
break function calls early with different criteria

Rustfmt currently does the last. Heuristically, I have found this most closely matches existing code which I think looks good.

gnzlbg commented 7 years ago

I think of something as short if it fits in one line. That is, depending on your maximum line length:

// max line length -------|
// Short:
let x = Foo { x: 42 };
// Long:
let x_is_too_long = Foo { 
    x: 42 
};

// Short
struct Foo { x: usize }
// Long
struct FooToo { 
    xLong: usize 
}

// Short:
fn f(x: u32) -> u32 { 42 }
// Long:
fn foo(xoo: u32) -> u32 {
    42
}

Clang format has independent knobs to allow short "if clauses", "functions", "for loops", "classes", etc. , but it only has one consistent definition of "short": "if X fits in one line, then it will be put in one line". Whether "X" is a loop or a class/function declaration is orthogonal to the definition of short. The only complaint I have about clang-format is that one has to enable this for every situation that it supports (if clauses, functions, for loops, types...), instead of providing a knob that enables this everywhere.

nrc commented 7 years ago

If we do go with the complexity scheme, I feel like we need to consider very small numbers of arguments. In particular, istm that if there is a single argument, then a function call should never get multi-lined (unless it hits the max line width). I don't think that should apply to struct lits though. Also worth considering a function call with two fairly complex arguments, where we would probably put that on one line if hand-writing.

joshtriplett commented 7 years ago

@nrc I agree regarding single arguments, and honestly, I think that applies to single fields, too. And honestly, the more I look at the examples, the more I'm inclined to err on the side of "keep it on one line if it fits", at least for functions.

djc commented 7 years ago

If I may throw my hat in the ring, for my taste current rustfmt does not value vertical density enough. Here are some examples of changes I disliked:

diff --git a/src/client/mod.rs b/src/client/mod.rs
index b8ff658..41db84c 100644
--- a/src/client/mod.rs
+++ b/src/client/mod.rs
@@ -32,7 +32,10 @@ impl Client {
     }

     pub fn call(self, cmd: Command) -> ResponseStream {
-        let Client { transport, mut state } = self;
+        let Client {
+            transport,
+            mut state,
+        } = self;
         let request_id = state.request_ids.next().unwrap();
         let (cmd_bytes, next_state) = cmd.to_parts();
         let future = transport.send(Request(request_id.clone(), cmd_bytes));

@@ -155,11 +169,17 @@ pub struct FetchCommand {
     fn prepare(self) -> FetchCommand;
     fn build(self) -> Command {
         let FetchCommand { args } = self.prepare();
-        Command { args, next_state: None }
+        Command {
+            args,
+            next_state: None,
+        }
     }
     fn changed_since(self, seq: u64) -> FetchCommand {
         let FetchCommand { mut args } = self.prepare();

joshtriplett commented 7 years ago

@djc Agreed in both cases.

liigo commented 7 years ago

my 'short': not exceed the max-column-count my 'long': exceed the max-column-count 'short' <= 'long' (max-column-count is a customized const value, maybe 80, 100, or 110. I like a bigger number, since many of us have big monitor.)

nrc commented 7 years ago

We discussed this again at the meeting this week. We covered a bit of ground about what we liked and didn't like, looked at some examples, and discussed the motivation for such heuristics.

We still didn't agree exactly on a definition, but in order to move forward I propose the following:

we agree we should have some heuristic
we agree to specify a function signature for an is_simple function or functions which provide an answer to the question 'should this list be formatted on a single line or across multiple lines?'.
we should decide how the implementation of these function(s) should be decided:
- we could leave it up to implementers
- we could specify it precisely (I prefer to avoid this)
- we could punt for a future RFC (i.e., leave it unspecified in the first draft)
- we could provide a suggestion/guideline and leave it technically unspecified
- something else
we collect examples of code which we think looks good or bad to motivate this discussion (in particular examples from previous attempts to do this which have gone 'wrong').

repax commented 7 years ago

I propose this as a necessary but not sufficient requirement for an item to be short:

contains at most one statement, i.e. one ;.

joshtriplett commented 7 years ago

Based on extensive discussions both within the style team and elsewhere, I think I might have a rough idea (needing refinement) for criteria for "simple", and for how we should break lines.

If something contains binary operators at at most one level, such that it's trivial to mentally parse what operands appear at what level, then it's "simple". (It might still need breaking across lines if it exceeds the line length limit.)
If something contains binary operators at more than one level, such that figuring out which level any given operand sits at requires careful mental parsing of parentheses, then it isn't "simple". This includes expressions separated by commas (e.g. tuples as arguments), or structures as arguments.
- For example, a call like f(a, b, c, d, e, f, g, h, i, j) is "simple", while a call f((a, b), (c, d)) is not, because in the latter (especially as the expression grows more complex), reconstructing the parse tree in your head isn't trivial. It breaks fast mental parsing heuristics.
- Something like f(g(h(a + b + c))) is also "simple" by this metric. But f(g(h(a + b) + c)) and f(g(h(a + b), c)) are not.
For the purposes of the above, a nested function call can still be "simple" if it doesn't contain multiple arguments at multiple levels. For instance, foo(bar(baz(a, b, c, d))) and foo(bar(a), baz(b)) are both still "simple", because the whitespace naturally groups the arguments and a fast mental parse produces the correct result. However, foo(bar(baz(a, b), c), d) is not "simple", because a fast mental parse has to carefully consider parentheses and commas to associate the arguments with their functions.
If something contains trailing unary operators (such as ?) or trailing chained function calls (such as .unwrap()) and has closing parentheses both inside and outside those, that isn't "simple" either, because it would require carefully matching up the closing parentheses to see what level that operator applies at. For instance, anything ending in ))?)) or )).unwrap()) isn't going to be "simple". (This needs to be stated more precisely.)
- In this case, the line breaks should make it trivial to see what the trailing operation applies to.
If the expression evaluation alternates back and forth between leading and trailing unary operations or chained function calls, such that it's difficult to pair up the leading operations with the trailing operations and determine the order they occur in, it isn't "simple". (For instance, if you apply a leading operator like & or - to an expression that contains trailing chained function calls.)
- In this case, the line breaks should group the leading operators with the full expressions they apply to.

As a rough sketch, how does that sound? I think this would capture many aspects of the heuristic we're subconsciously using to say "that needs breaking across lines".

joshtriplett commented 7 years ago

One additional thought, though I'm less certain about this one:

A trailing function call alone can still be "simple" if it doesn't contain trailing function calls at multiple levels of parentheses. For instance, func(thing.foo(), thing2.bar()) is still "simple", because a fast mental parse glues the trailing functions to the names they apply to, due to the lack of whitespace around the . of method invocation. I'd even call func(thing.foo(x), thing2.bar(y)) "simple". However, func(thing.foo(), thing2.bar()).baz() and even func(thing2.bar()).baz() are not "simple", because those same mental parsing heuristics don't make it trivial to account for the ) that separates thing2.bar() from .baz().

repax commented 7 years ago

If something contains binary operators at more than one level, such that figuring out which level any given operand sits at requires careful mental parsing of parentheses, then it isn't "simple".

I don't know how that is meant to be defined but while this expression is arguably not simple:

add(mul(a, b), mul(c, d))

the following certainty is, imho:

a * b + c * d

as well as this one:

(a + b) * (c + d)

joshtriplett commented 7 years ago

@repax I've been talking about this primarily in the context of things like function calls and struct initializers, because those are where we invoke the concept of "short"/"simple". We don't currently apply those criteria to expressions like those. They could form a part of a larger expression to which we would, though, so we do have to think about them a little.

I personally tend to err on the side of adding unnecessary parentheses and almost never relying on operator precedence, just to simplify mental parsing. But in general, I agree that both of the expressions you posted seem simple. (They could become non-simple if applied to more than just a, b, c, and d.)

I think the premise I'd use for that particular point is that thanks to precedence, we can think of a * b + c * d and (a + b) * (c + d) each as "one" expression, with a series of binary operators, which we mentally parse using infix operator precedence rules. That doesn't have the same problem that, for instance, f(g(h(a + b) + c)) has, where you have to switch back and forth between prefix and infix to parse it. You can see, without too much difficulty, that a + b comes at a different level than + c, but then you have to carefully match parentheses to see that it goes with h. And while that looks trivial in that case, when those names and argument lists get longer, matching up the parentheses becomes the main problem in fast mental parsing.

So, based on that, I'd say this:

a * b + c * d is simple.
foo(a * b + c * d) is simple.
foo(bar(a * b) * c) is not simple. Notice the way that, if you mentally walk down the AST, it switches from prefix to infix to prefix to infix.
bar(a * b) * c is, in my opinion, not simple either, for the same reason. (It's slightly easier to parse because you only have to switch back and forth once, so there's only one "inside" and one "outside", but you still have to scan back and forth to see the function associated with the parentheses.)
- This is complex for the same fundamental reason that an expression ending in ))?)) is complex: you have to match the closing parenthesis to understand the expression. Whereas in an expression ending in )))), you don't actually need to pair them up with the matching opening parentheses at all to understand the expression.
Arguably, a * bar(b * c) is actually simple, for the same reason the rules I listed differentiate between prefix and postfix operators. You can scan that left-to-right, you don't have to jump back to the left to see the function name associated with the close parenthesis.
- I don't think this gets any clearer if broken across lines.

This same "left-to-right", "no careful parenthesis-matching" rule also explains why, for instance, I'd consider foo(x)?.bar(y)?.baz(z)? simple. I'd even consider foo(a, b)?.bar(c, d)?.baz(e, f)? simple; it has many steps but the flow is straightforward.

nrc commented 7 years ago

I think I prefer to leave this basically unspecified - looking at the amount of detail required in @joshtriplett 's suggestion, I think it is better to leave it entirely up to an implementer, and only say that implementers may have a concept of short or simple defined in some way.

steveklabnik commented 7 years ago

I'm feeling similar to @nrc

joshtriplett commented 7 years ago

@nrc To be clear, I don't think this level of detail is required to actually specify what "short" means. I've been carefully enumerating and classifying cases, but I feel like there's a simple set of rules to extract from those cases that would capture the meaning. I'd appreciate some help capturing those rules, and then testing them against the various cases to see if they produce the same results.

I think some premise about switching between infix, prefix, and postfix, multiple times in an expression, in a way that prevents left-to-right parsing, would capture the majority of the cases. Something about having comma-separated lists at multiple levels of an expression should capture the rest, or we might manage it by folding that into the same rule and pretending that , is an infix operator for the purposes of that rule. Plausible?

By "leave it entirely up to an implementer", do you mean that rustfmt might still do this but we wouldn't put it in the style guide? What's the advantage of doing it that way? I'm concerned that if we do that, we'll get noticeable inconsistencies in this area, if there are multiple tools that format Rust.

nrc commented 7 years ago

By "leave it entirely up to an implementer", do you mean that rustfmt might still do this but we wouldn't put it in the style guide? What's the advantage of doing it that way? I'm concerned that if we do that, we'll get noticeable inconsistencies in this area, if there are multiple tools that format Rust.

The advantage of not spec'ing:

allows implementations to incrementally improve their design
it makes the spec simpler
it makes implementation easier (lowering the bar for more implementations)
it is less effort (better to ship something slightly sub-optimal than to never ship the perfect spec)

So, I think for any issue there is a trade-off between specification complexity and the effect of the specification. My belief is that the downsides of not spec'ing this (i.e., the variation between implementations) do not outweigh the benefits here - I think that any variation in this respect will not be very noticeable and that there will be far wider variations from other design decisions. (Also, practically speaking - there is only one implementation and I'd rather experiment with this question in code rather than trying to get the design perfect up front).

rust-lang / style-team

Define 'short' #47