Open jfmengels opened 10 months ago
So my understanding is that you want element positions in order to make text edits easier for elm-review. But what if elm-review worked with the AST directly? That is to say, if you want to provide a fix, you just update the AST data structure.
Currently this isn't practical since the AST doesn't store line breaks or comments. Without that information the code would change considerably when elm-review applies a fix. But if V8 of elm-syntax did include that information, and if elm-review operated on the level of AST changes rather than text replacement, would it make sense to track element positions?
The ranges are also used for the displayed error range.
E.g. it's a lot nicer to mark only the ++
in [ 1 * a ] ++ bs
instead of the whole expression because the error range on 1 * a
won't be inside and hidden by the outer.
I love your suggestion anyway, though, even if it doesn't remove the need for these token ranges
As @lue-bird said, it's mostly to get more information out of the AST, to put squiggly lines in the ideal position (like on the ++
), but also to have better string-based fixes.
I've created a separate issue to discuss AST replacements: https://github.com/stil4m/elm-syntax/issues/206
One thing that has felt missing in
elm-syntax
's AST for a long time and has given me quite a bit of trouble in some places is the location of keywords. For instance, in anif
expression, it is pretty easy to know where theif
keyword is, but it is pretty hard (impossible without access to the raw source code) to know where thethen
orelse
keywords are located (without making assumptions like the file has beenelm-format
ted.My proposal is to store this information in the AST node as an additional element:
would become
With this approach, these keywords can easily be ignored (
IfBlock _ condition thenBranch elseBranch
) and they can also be extracted through pattern matching (IfBlock { thenRange } condition thenBranch elseBranch
).I don't think we should store the whitespace, as it can be computed from the rest.
Non-exhaustive list of things I would like to have here too:
module
keyword (can be computed though)exposing
keyword..
symbol without the parens?,
symbolsimport
keyword (can be computed though)as
keywordexposing
keywordtype
keyword (can be computed though)alias
keyword (they don't have to be together)=
symboltype
keyword (can be computed though)=
and|
symbols:
in=
symbol,
symbols,
symbols. If we haveTuple2
andTuple3
variants, we can store them ascommaRange
(for Tuple2) andfirstCommaRange
andsecondCommaRange
(for Tuple3).,
symbols:
symbol (( Node Pattern, {- range for : -} Range, Node Expression }
?)|
symbol.
range and range for the field name separately?case
keyword (can be computed though)of
keyword->
symbolslet
keyword (can be computed though)in
keyword`-
symbol (can be computed though)\
symbol (can be computed though)->
symbol:
symbol->
symbol in function type,
,:
and|
just like for records and tuplesI'm sure we can find more.
I don't know how to model the
,
in lists yet, without making them really combersome to use. Maybe as acommaList : List Range
and people will have to useList.map2
with or something? This is not super practical since the list of items and the list of commas is off by one, unless we show good examples or have helper functions for this.Actually, if we have non-empty lists for these, we #192
A few questions on this:
IfKeywords
, or should we inline everythingIfBlock { ifRange : Range, ... } (Node Expression) (Node Expression) (Node Expression)
? I think the former feels nicer, especially to read.IfKeywords
,IfRanges
,IfKeywordRanges
? For this specific element "keywords" makes sense but for other things, it will mostly be about symbols (,
) rather than keywords.We should do a good job of identifying what is an element, and what isn't. For instance, I recently realized that for
exposing (..)
, we store the position for(..)
, but you can writeexposing ( .. )
if you'd like (the 2 dots need to be next to each other). So in this case, you might want to store( .. )
I don't know whether we should store the position of elements whose position can easily be computed from the element's range. Example, the
if
keyword will always be the first 2 characters of theIf
expression's range ({start = ifRange.start, end = { row = ifRange.end.row, column = ifRange.end.column + 2 } }
. For performance reasons mostly (having a smaller data structure), I think we could skip storing that one, and potentially have a function to do the computation. Similarly forList
andRecord
, we don't store the position for[
and{
. I agree it will a bit inconsistent though.If we want to go for the least amount of memory, we could even store the
Location
(just the row and column) instead of theRange
, since the size is statically known (only do this for those that can be computed like this).@MartinSStewart You mentioned somewhere that you would like to contain more information in order to help with some fixes migrations (Or to be able with using AST for fixes). Would this suffice for you or do you feel like you'd be missing some information still? My assumption is that if you have the position of every keyword and token, you can then compare ranges to count the whitespace. The only thing you wouldn't be able to tell is the whitespace at the end of a line, but I'm not sure how important that is, since it will usually be removed by
elm-format
anyway.I don't know if having this information would make writing/codegening ASTs easier or harder in practice.