tweag / topiary

https://topiary.tweag.io/
MIT License
565 stars 29 forks source link

Format multi-line recursive chains properly #87

Closed torhovland closed 1 year ago

torhovland commented 2 years ago

The formatter currently does this:

(?used_slot: bool ref ->
  override_flag ->
  Env.t ->
  Location.t ->
  Longident.t loc -> Path.t * Env.t)

rather than this:

(?used_slot: bool ref ->
  override_flag ->
  Env.t ->
  Location.t ->
  Longident.t loc -> 
  Path.t * Env.t)

This is because the tree is recursive like this:

[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:       {Node constructed_type (180, 2) - (182, 7)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:         {Node ( (180, 2) - (180, 3)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:         {Node function_type (180, 3) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:           {Node typed_label (180, 3) - (180, 22)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node ? (180, 3) - (180, 4)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node label_name (180, 4) - (180, 13)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node : (180, 13) - (180, 14)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node constructed_type (180, 14) - (180, 22)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node type_constructor_path (180, 14) - (180, 18)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node type_constructor (180, 14) - (180, 18)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node type_constructor_path (180, 19) - (180, 22)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node type_constructor (180, 19) - (180, 22)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:           {Node -> (180, 23) - (180, 25)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:           {Node function_type (180, 26) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node type_constructor_path (180, 26) - (180, 39)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node type_constructor (180, 26) - (180, 39)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node -> (180, 40) - (180, 42)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:             {Node function_type (180, 43) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node type_constructor_path (180, 43) - (180, 48)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node extended_module_path (180, 43) - (180, 46)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node module_name (180, 43) - (180, 46)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node . (180, 46) - (180, 47)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node type_constructor (180, 47) - (180, 48)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node -> (180, 49) - (180, 51)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:               {Node function_type (180, 52) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node type_constructor_path (180, 52) - (180, 62)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node extended_module_path (180, 52) - (180, 60)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node module_name (180, 52) - (180, 60)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node . (180, 60) - (180, 61)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node type_constructor (180, 61) - (180, 62)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node -> (180, 63) - (180, 65)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                 {Node function_type (181, 3) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node constructed_type (181, 3) - (181, 18)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node type_constructor_path (181, 3) - (181, 14)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node extended_module_path (181, 3) - (181, 12)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                         {Node module_name (181, 3) - (181, 12)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node . (181, 12) - (181, 13)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node type_constructor (181, 13) - (181, 14)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node type_constructor_path (181, 15) - (181, 18)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node type_constructor (181, 15) - (181, 18)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node -> (181, 19) - (181, 21)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                   {Node tuple_type (181, 22) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node type_constructor_path (181, 22) - (181, 28)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node extended_module_path (181, 22) - (181, 26)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                         {Node module_name (181, 22) - (181, 26)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node . (181, 26) - (181, 27)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node type_constructor (181, 27) - (181, 28)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node * (181, 29) - (181, 30)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                     {Node type_constructor_path (181, 31) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node extended_module_path (181, 31) - (181, 34)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                         {Node module_name (181, 31) - (181, 34)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node . (181, 34) - (181, 35)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:                       {Node type_constructor (181, 35) - (181, 36)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:         {Node ) (181, 36) - (181, 37)} - Named: false
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:         {Node type_constructor_path (182, 4) - (182, 7)} - Named: true
[2022-10-05T12:25:52Z DEBUG tree_sitter_formatter::atom_collection] CST node:           {Node type_constructor (182, 4) - (182, 7)} - Named: true

So the innermost functions aren't multi-line, hence won't get formatted as such.

When deciding if a node is multi-line, we should check if there is a line of same-type ancestors (such as function_type), where at least one of them is multi-line.

aspiwack commented 2 years ago

@tbagrel1 you may want to comment on what you did in Ormolu.

tbagrel1 commented 2 years ago

In Ormolu, I dealt with operator chains (either term-level or type-level operators), like 1 + 2 * 3 + 4 + 5, or Int :*: Int :+: Int. I decided to build a n-ary tree in which all operations of the same priority are located on the same tree level. So I first flattened the tree, and then (recursively) splitted on the least prioritary operator at a given level, or introduced a subtree when adjacent leaves are "joined" with a most prioritary operator (see https://github.com/tweag/ormolu/blob/master/src/Ormolu/Printer/Meat/Declaration/OpTree.hs#L48). The two actions "split" or "group" gives the same result when the operator priority is a complete order, but they are complementary when we only have a partial order on priority/fixity.

.+.*.+.*.      .+...+..
| | | | |  --> |  |   |
1 2 3 4 5      1  *   *    (this can be either "split on +" or "group on *")
                 / \ / \
                 2 3 4 5

And then each level of the tree uses the upper bound of the original "line discipline" of its nodes (where multiline > singleline). So basically I dropped recursion in favor of a more global/omniscient analysis of the operator tree.

Feel free to ask if you want more details about what I implemented in Ormolu.

torhovland commented 2 years ago

Thanks for the explanation. In tree-sitter-formatter, we don't have the "luxury" of knowing anything about the rules of the language, such as operator precedence. We only know the syntax tree that tree-sitter gives us. So I think the best we can do is to choose between single-line and full multi-line formatting.

The problem reported above is that this may turn out oddly for recursive chains, because the inner terms may be single-line, while the outer terms are multi-line. So we may end up with something like this:

1 
+ 2 
* 3 + 4 + 5

But we should be able to solve this by making multi-line detection a little smarter.

aspiwack commented 2 years ago

Thanks for the explanation. In tree-sitter-formatter, we don't have the "luxury" of knowing anything about the rules of the language, such as operator precedence.

It is mostly externally configured in Ormolu, actually. Because the precedence of operators is configurable in Haskell. Though Ormolu has the advantage of knowing where to look to populate the configuration with sensible defaults.

torhovland commented 2 years ago

I see. It might be possible to do something on the configuration level in the query files (where we specify indent level). But that would have to go in a separate issue. The first step (this issue) is to at least have a consistent multi-line formatting of recursive chains.

aspiwack commented 2 years ago

Agreed.

tbagrel1 commented 2 years ago

I mean, even without knowing anything about fixity/priority, you could still decide that the function arrow has a lower priority than anything else.

For term-level operators, it is also possible to make a base of sensible defaults (at least for arithmetic and boolean operators).

tbagrel1 commented 2 years ago

I don't see how you would get something like this:

1 
+ 2 
* 3 + 4 + 5

If you don't know anything about the operator priorities, then the n-ary tree stays "flattened" (that is exactly how it works in ormolu when unknown operators are used), and then we use the upper bound of the "line discipline" of the nodes; that is, multiline in this case.

So we would get

1
+ 2
* 3
+ 4
+ 5
torhovland commented 2 years ago

I mean, even without knowing anything about fixity/priority, you could still decide that the function arrow has a lower priority than anything else.

In our formatter, "->" is just an anonymous node that happens to be provided by the OCaml grammar. In other grammars, it would not be a valid node. In the OCaml grammar file, we can say that we'd like soft line breaks after "->", but we cannot easily say that "->" has a lower priority than "+".

torhovland commented 2 years ago

I don't see how you would get something like this:

1 
+ 2 
* 3 + 4 + 5

If the grammar gave us a tree like the following, it would be fine:

But instead it gives us something like this:

tbagrel1 commented 2 years ago

But if you look at your example, you might end up with

(?used_slot: bool ref ->
  override_flag ->
  Env.t ->
  Location.t ->
  Longident.t loc -> 
  Path.t *
  Env.t)

if both * and -> are treated with the same (unknown) priority. So you might not want a completely consistent formatting at the end of the day.

torhovland commented 2 years ago

Yes, this is true. So being able to specify operator precedence in the query file, and honouring that when formatting, would be very useful.

tbagrel1 commented 2 years ago

About your last message, GHC parser first gives a binary tree, that is transformed into a (flattened) n-ary tree by Ormolu, and then this n-ary tree is processed (split & group) and then analyzed to determine the printing discipline

tbagrel1 commented 2 years ago

See https://www.tweag.io/blog/2022-02-10-ormolu-and-operators/ :)

torhovland commented 2 years ago

Yes, I actually already read that 🙂

Our challenge is that we have to make do with the tree that the specific grammar gives us. The OCaml grammar is yielding these recursive chains, while a different grammar for another language might give us something more tree-like.

tbagrel1 commented 2 years ago

Could you really get something much different than either a binary tree or a n-ary tree?

aspiwack commented 2 years ago

Our challenge is that we have to make do with the tree that the specific grammar gives us. The OCaml grammar is yielding these recursive chains, while a different grammar for another language might give us something more tree-like.

Without considering precedence for now, we may consider having a query thingy (the @property things, I don't know how they are called) telling me that I want to collapse a chain into a list somehow. When I find an a + b I mark it as @collapse_chain. And somehow, the engine treats the entire chain as a single node.

torhovland commented 2 years ago

We might have to combine that with the input line break detection, i.e. the same that we do for figuring out if a comment is supposed to be at the end of a line or on the next line. So that

something_short + something_else_short

can be kept on a single line if it is on a single line in the input, while

something_very_very_very_long 
+ something_else_very_very_very_long

can be split if it is split in the input.

nbacquey commented 1 year ago

Note that this issue also affects formatting of tuples. For instance,

(
  1, 2,
  3, 4
)

is formatted as

(
  1, 2,
  3,
  4
)

This is because tuples are parsed as left-associative nested couples by the grammar.