jfmengels / elm-review

Analyzes Elm projects, to help find mistakes before your users find them.
https://package.elm-lang.org/packages/jfmengels/elm-review/latest/
Other
252 stars 13 forks source link

Add utility to get the parent node of an expression in the AST #162

Open jfmengels opened 1 year ago

jfmengels commented 1 year ago

Sometimes rules need to know the parent expression of the node that they're looking at. This proposal tries to make it easy to get that.

An example where this can be needed is for Simplify: Simplify corrects [ a ] ++ rest to a :: rest, but want to avoid doing the same change when this expression is on the right side of ++ like in left ++ [ a ] ++ rest. To avoid this, we store the location of nodes that are on the right side of ++ in the context

With an easy access to the parent, we could skip storing things in the context, and make easier checks.

Idea: Finding the node

Unfortunately, the AST we have does not make that easy at all, so the idea is to find the node again by going through the AST eagerly.

This could be provided as part of the API for elm-syntax and/or as part of elm-review, though since I think it makes most sense for elm-review, it should be in elm-review. If we ever choose otherwise, we could move it to elm-syntax and use that version internally in elm-review.

{-| Get the parent expression for the given range -}
parentFor : Node Expression -> Maybe (Node Expression)
parentAt : Range -> Maybe (Node Expression)

{-| Get the declaration in which this expression resides -}
declarationFor : Node Expression -> Maybe (Node Declaration)
declarationAt : Range -> Maybe (Node Declaration)

{-| Get the declaration in which this expression resides and the list of all parent expressions.
The first item in the `parents` list would be the direct parent, while the last one would be the body of the declaration.
 -}
parentsFor : Node Expression -> Maybe { declaration : Node Declaration, parents : Node Expression }
parentsAt : Range -> Maybe { declaration : Node Declaration, parents : Node Expression }

Internally, parentAt would be defined something like this (and the others would be very similar):

-- A bit hand-wavy but hopefully it gets the gist
parentAt : Elm.Syntax.File.File -> Range -> Maybe (Node Expression)
parentAt ast =
    \range ->
      findDeclarationMatchingRange range ast.declarations
        |> Maybe.andThen (\declaration -> findExpression range declaration.body)

findExpression range ((Node nodeRange expr) as node) =
  if range == nodeRange then
    Just node
  else
    case expr of
      Parenthesized a -> findExpression range a
      -- ...
      _ -> Nothing

Availability

There are multiple options to get these functions.

1 - Request the functions through the context creator

This would be similar to withSourceCodeExtractor

contextCreator : Rule.ContextCreator () Context
contextCreator =
    Rule.initContextCreator
        (\parentFor parentAt () ->
            { parentFor = parentFor
            , parentAt = parentAt

            -- ...other fields
            }
        )
        |> Rule.withParentFor
        |> Rule.withParentAt

expressionVisitor context node =
  let
    parent = context.parentFor node
  in
  -- ...

A downside is that you need to add a field for every one of these function you want. Alternatively, we provide a record with these fields as functions

contextCreator : Rule.ContextCreator () Context
contextCreator =
    Rule.initContextCreator
        (\parentInfo () ->
            { parentInfo = parentInfo

            -- ...other fields
            }
        )
        |> Rule.withParentInfo

expressionVisitor context node =
  let
    parent = context.parentInfo.parentFor node
  in
  -- ...

2 - Provide a new module

I don't know the name yet, let's say it's Review.Parent for now. This would provide the functions defined above but with the explicit Elm.Syntax.File.File argument, which you'd get from withFullAst using the context creator.

contextCreator : Rule.ContextCreator () Context
contextCreator =
    Rule.initContextCreator
        (\ast () ->
            { ast = ast

            -- ...other fields
            }
        )
        |> Rule.withFullAst

expressionVisitor context node =
  let
    parent = Review.Parent.parentFor context.ast node
  in
  -- ...

Return value

There are some slightly tricky return values when you want the parent to something.

For instance, what should be the return value if you ask for the parent to a let declaration value? In the following example, what should we get if we ask the parent of 1? Should we get the let expression? What if I wanted the a declaration?

value = 
    let
       a =
          1
    in
    a

Potentially we could return a combination of Declaration/Expression:

-- Name TBD
type Segment
    = TopLevelDeclaration_ (Node Declaration)
    | LetDeclaration_ (Node LetDeclaration)
    | Expression_ (Node Expression)

parentAt : Range -> Maybe Segment
parentsAt : Range -> List Segment

For elm-review, it probably only makes sense to have getParent* functions that takes an AST. For other purposes, it might make sense (especially performance-wise) to have functions that take a declaration or an expression instead. I'm not sure here.

For other kinds of nodes

Does this make a lot of sense for other kinds of nodes than Expression, such as TypeAnnotation? I think it makes less sense, because it is pretty easy to "keep" the surrounding information when you're looking at a type signature or type definition. This is mostly a problem for Expression because when you get since you don't visit

Alternative solutions

The current solution is to store the parent node in the context (or the derived information you want from it), so that's obviously a solution we can continue using.

Aother solution would be to have different visitors that pass the current node and its parent node, like withExpressionWithParentVisitor. I think the naming can get pretty wild with all the variations we'd like to support: simple visitor, enter visitor, exit visitor, let declaration visitor, case expression visitor. This would 6 additional visitors just with this variation. It is also a bit more limi

Maybe we can come up with an interesting solution like with the ContextCreator where you can request more information: the parent node, the current declaration, all of the segments, etc.

These solutions would be more performant than the "finding solution" at least as they require a lot less computations.


What do you think of these proposals? I'm undecided as to which solutions to go for. I like the idea of the Segment API, though I don't know if I'll like the ergonomics of working with it.

gampleman commented 1 year ago

I think in general elm-syntax could do with some companion modules that help with and simplify common AST manipulation tasks. This would be a great fit for that (not that it matters to anyone, but that would be my vote for option 2 but for it to go to elm-syntax or perhaps elm-syntax-extra).

The reason this is tractable at all is that the Node stores the Range which is unique for each expression, right? So on the performance side, this could be a very cheap operation, as it's a classic tree search O(log n) I think...