rust-lang / rustfmt

Format Rust code
https://rust-lang.github.io/rustfmt/
Apache License 2.0
5.96k stars 880 forks source link

Support structured comments as an optional configuration #5162

Closed sloganking closed 2 years ago

sloganking commented 2 years ago

What is structured commenting?

I have found using a commenting scheme which I'll call structured commenting, to be very helpful and intuitive.

The rules are:

In this way, what each comment is talking about is explicit and unambiguous. As well as allowing comments to be nested inside one another.

Examples

You can find further explanation of structured commenting here:

https://github.com/sloganking/Structured-Commenting

and some examples of rust code using it here:

https://github.com/sloganking/rs-text-compression/tree/fed49ecef5f5cc8a23e3a1b955d45687d14c1320/src

Problem and Proposal

rustfmt currently forces comments and code to be on the same tab depth. Which makes all structured comments talking about multiple lines, broken and ambiguous.

I propose making a boolean configuration option, read from rustfmt.toml, that when set to true, would recognize comments as structures that code and other comments can be tabbed to the right of. Just like how code between curly brackets { }, is one tab depth more to the right than the brackets themselves.

Effectively, enabling this configuration option would allow rustfmt to output this:

//comment talking about the next line
let a = 1;

// comment talking about the next 6 lines
    let b = 2;
    let c = 3;

    // comment talking about the next 2 lines
        let d = 4;
        let e = 5;

let f = 6;

Instead of rustfmt mangling it (as it currently does) and turning it into this:

//comment talking about the next line
let a = 1;

// comment talking about the next 6 lines
let b = 2;
let c = 3;

// comment talking about the next 2 lines
let d = 4;
let e = 5;

let f = 6;

Take note how the current rustfmt formatting makes the comment // comment talking about the next 6 lines appear like it's talking about 2 lines instead of the intended 6 that it wishes to describe.

calebcartwright commented 2 years ago

Thanks for sharing your suggestion, but to be fully transparent I don't see this ever coming to fruition.

rustfmt does not directly process your input files, but instead gets the AST representation of your input program from the compiler and it is that AST used for formatting. Comments are not explicitly represented in the AST and it's often painful enough just to recover those unrepresented comments. Trying to use comments in the manner as described is almost certainly going to come with a number of technical challenges for which any resolution would add undue complexity on the rustfmt source.

As such this is not something the rustfmt team will ever work on. However, I'll leave the issue open for a while in case you or someone else feels strongly enough to propose a PR.

sloganking commented 2 years ago

I'll leave the issue open for a while in case you or someone else feels strongly enough to propose a PR.

I am interested in this problem. I have begun reading Crafting Interpreters, which is giving me a basic understanding of scanners, parsers, and ASTs.

Is @calebcartwright or anyone able to recommend resources for learning to work with code formatters?

calebcartwright commented 2 years ago

I am interested in this problem. I have begun reading Crafting Interpreters, which is giving me a basic understanding of scanners, parsers, and ASTs.

Is @calebcartwright or anyone able to recommend resources for learning to work with code formatters?

First I just want to say it's great that you're interested in learning about those topics and contributing! There's more than enough work to go around, both here in rustfmt (formatting) and in rust-lang/rust (compiler, stdlib, etc.).

I will provide a very high level answer to your question below, however, I want to make sure my above feedback in https://github.com/rust-lang/rustfmt/issues/5162#issuecomment-1003465173 is absolutely clear: you're welcome to try to implement this if you really want to, but if you choose to do so you'll need to work independently; we're not going to put any time/resources into this feature request, neither coding nor assistance/mentoring.

I realize you're a proponent of this approach to comments, but I think it's fair to say that at best it's usage in Rust is very fringe, and sharply divergent from our core goals and general idiomatic Rust coding. Additionally, I think that any implementation for your ask would be highly invasive on our codebase requiring a large diff and increasing both our maintenance burden and surface for bugs, which simply wouldn't make sense for us to take on. While I'd be happy to be proven wrong, this is why we're not going to put any effort into this.

Is @calebcartwright or anyone able to recommend resources for learning to work with code formatters?

As to this question, and with the above constraints, no, I'm not aware of any resource material, certainly not the myriad formal/academic kinds one can find with the significantly more complex subjects like compilers. There's a ton of automated code formatters for various languages, but they essentially all fall into one of two categories

There's tradeoffs between the two and different users have different preferences. In general those in the first category are inherently more "opinionated" about resultant style but can support some more advanced transformations/rules, while the latter tend to "do less" and allow devs to maintain more of the individualistic style preferences as they originally wrote the code. However, even within the first category you'll find formatter implementations with varying strengths of opinions (e.g. some defer more/less to original input style, line breaks, etc.), how configurable they are, and more.

rustfmt is absolutely a member of that first category. rustfmt doesn't really do any parsing itself, but instead utilizes the compiler's internal parser to produce the AST for the input program. rustfmt then has processing code for every type of AST node which is respectively utilized for encountered node types.

Some of this is covered in the Design doc which provides more details about rustfmt. TBH some of the architecture/implementation specifics are highly outdated at this point, but you may find some of the choices and thought processes interesting.

All of that being said, any work you explore to implement your feature request will obviously need to be done within the current context of the rustfmt source code, regardless of generalities of code formatters. I'm not even sure where to recommend starting because it's likely just about anywhere. I suppose you could look for where/how comment recovery takes place currently within a given AST node, and then try to extrapolate how you'd back that out to checking for preceding comments to determine the shape for formatting child nodes.

I still suspect that once you start digging in you'll come to the same conclusion as me fairly quickly, but wish you luck regardless!

sloganking commented 2 years ago

Thank you for the thoughtful reply. I understand the rust-lang team's stance on not putting any time/resources into this, and concerns with added complexity.

While non-traditional, and possibly difficult to implement. I still believe this has the potential to significantly increase code clarity. So I may take time to research implementing this on my own, even if it's potentially not merged.

===

I am currently exploring both improving the concept and making it easier to implement, by adding the equivalent of opening and closing brackets to the multi-line comment system.

// normal traditional comment
...

//{ opening multi line comment
    ...
//}{ closing the last, and opening a new multi line comment
    ...
//} closing multi line comment (which would normally not have any text)

...

This would have a few benefits. The one concerning this issue being that it would allow multi-line comments to be easily parsed and represented in ASTs. The compiler putting them in an AST could potentially be made optional, as they would not be necessary when compiling code. Though were I to have it my way, unclosed multi-line comments would be considered a compiler error. Seeing as I think that code documentation should be considered just as important as the code itself.

calebcartwright commented 2 years ago

This would have a few benefits. The one concerning this issue being that it would allow multi-line comments to be easily parsed and represented in ASTs. The compiler putting them in an AST could potentially be made optional, as they would not be necessary when compiling code. Though were I to have it my way, unclosed multi-line comments would be considered a compiler error. Seeing as I think that code documentation should be considered just as important as the code itself.

Hate to be the bearer of bad news, but this is a non-starter. I realize that being able to design your own parser and code representation would make your goal easier, but it would not be reasonable to make such invasive changes to Rust's compiler (that would have both development and runtime costs) with unnecessary overhead simply to potentially support a non-default config option in rustfmt that would most likely be rarely utilized in practice.

If this remains something you want to pursue, you're going to have to figure out how to make it work within the existing constraints.

sloganking commented 2 years ago

That makes sense and is reasonable. I will not modify the AST.

calebcartwright commented 2 years ago

Thanks again, but I'm going to go ahead and close this. The above commentary certainly still stands, but after some additional reflection I've decided I don't want to leave this open as I don't to imply we're actively encouraging/soliciting this.

sloganking commented 2 years ago

For any future readers of this issue, I've decided to build a third party tool that can enable support for this by running it after traditional code formatters (such as rustfmt). It implements this issue's desired formatting behavior, with the exception of preserving max_width. In order to guarantee max_width traditional code formatters would need to add structured commenting support themselves.

https://github.com/sloganking/Structured-commenting-formatter

https://crates.io/crates/scfmt