stil4m / elm-syntax

Elm syntax in Elm
MIT License
94 stars 28 forks source link

Update the AST and print it afterwards #206

Open jfmengels opened 1 year ago

jfmengels commented 1 year ago

I'm creating an issue for a feature that @MartinSStewart asked about (and I'm sure others have had in mind as well) in https://github.com/stil4m/elm-syntax/issues/205#issuecomment-1694714400. This is more of a question for elm-review, but it could also apply to elm-syntax-dsl, elm-codegen, etc., so let's have it here :shrug:

So my understanding is that you want element positions in order to make text edits easier for elm-review. But what if elm-review worked with the AST directly? That is to say, if you want to provide a fix, you just update the AST data structure.

Currently this isn't practical since the AST doesn't store line breaks or comments. Without that information the code would change considerably when elm-review applies a fix. But if V8 of elm-syntax did include that information, and if elm-review operated on the level of AST changes rather than text replacement, would it make sense to track element positions?

My problem with this is that I am not entirely sure how some situations should be handled. Don't consider this issue as a carefully-crafted issue, it's a bit of a brain-dump. I do hope it will lead to some good discussions and solutions though :crossed_fingers:

Let's start with the assumption that comments are stored as they are in v7, at the root of a file, and not inside the AST (expressions, declarations, etc.).

Let's say we replace fn arg arg2 by fn newArg arg2 (let's just replace the node corresponging to newArg).

I guess one question to ask first is: do we expect the writer to print the source 1) as the ranges indicate (toString (parse source) == source)? 2) à la elm-format

If we print it with 1), then we have have a problem, because we messed up the ranges: newArg is larger than arg, so all the code that comes afterward on the same line should come later, meaning we can't rely on the original positions anymore, or we have to compute the offset by the different replacements as we go along and print the AST. Otherwise we will end up with fn newArgg2.

Not impossible, but it does add some complexity, especially if we want to inject comments at the right position.

With 2), the complexity will probably be around knowing when to break a line or not. I don't know if there's a problem here yet or not in practice.


If we have AST replacements, I'm guessing that we fix authors will likely (want to) use empty ranges (Node.empty / Range.empty) if possible. That will allow doing something like

case node of
  Node oldRange (Expression.FunctionOrValue [] _) ->
    Fix.replaceAst oldRange (Node.empty (Expression.FunctionOrValue [] newName)

instead of the more complex

case node of
  Node oldRange (Expression.FunctionOrValue [] _) ->
    Fix.replaceAst
      oldRange
      (Node { start = oldRange.start, end = { row = oldRange.end.row, column = oldRange.end.column + 3 } }
        (Expression.FunctionOrValue [] newName)
      )

which is a bit more complicated (and this is only for a function reference, if we need to compute the ranges correctly for large changes, I'd much rather use string edits).

So it's likely that if we do AST replacements and we use a elm-format like re-writing (would make sense for elm-review, at least to some extent), we will end up with inconsistent AST ranges before we print things out, meaning we will need to do a pass over the AST to recompute ranges, before we apply the elm-format algorithm. I could be wrong about this though.

One problem I can see happening is for instance this one. Say we have

foo =
  fn
    arg
    arg2

and we replace arg by newArg with Node.empty as the range. elm-format uses the row information to know when to do a line break or not.

In the absence of a row information (or rather, when the range is the empty range), the only sensible solution I can come up would be to assume that the position of an element is the same as the previous element. And that would lead the code to be reformatted to

foo =
  fn newArg
    arg2

I don't see a way around this problem except not using an empty range, or storing the whitespace information in the AST (which I think is a lot more data, and which again could become out of sync). It's not a deal-breaker, but I'm sure it will lead to some bug reports.


I'd love to have some solutions to these problems. AST replacements are done in non-Elm linters (I don't know which ones though), so maybe we can take some inspiration from them.