stil4m / elm-syntax

Elm syntax in Elm
MIT License
94 stars 28 forks source link

Remove Node and add Range directly to AST types #100

Open MartinSStewart opened 3 years ago

MartinSStewart commented 3 years ago

One problem I have when working with elm-syntax is that I spend a lot of time deconstructing types or fixing compiler errors related to forgetting to use Node.value on what I thought was a Expression but in fact was Node Expression. I spend a lot of time jumping between my code and the documentation because I often get lost in heavily nested data structures.

I think these problems could be greatly reduced if Node a was removed and instead Range was placed directly in all the AST types that use it.

As an example of how this makes things more simple, here is some code I wrote that uses elm-syntax

getTodo : ModuleContext -> Function -> List Todo
getTodo context  function =
    case ( function.signature, function.declaration ) of
        ( Just (Node _ signature), Node _ declaration ) ->
            case signature.typeAnnotation of
                Node _ (TypeAnnotation.Typed (Node _ ( [], "Codec" )) [ Node _ (TypeAnnotation.Typed codecType []) ]) ->
                    case declaration.expression of
                        Node _ (Expression.Application ((Node _ (Expression.FunctionOrValue [ "Debug" ] "todo")) :: _)) ->
                            case ModuleNameLookupTable.moduleNameFor context.lookupTable codecType of
                                Just actualCodecType ->
                                    ...

                                Nothing ->
                                    []

                        _ ->
                            []

                _ ->
                    []

        _ ->
            []

and here is the same code but I've removed Node and placed extra _ where a range value would be instead

getTodo2 : ModuleContext -> Function -> List Todo
getTodo2 context function =
    case ( function.signature, function.declaration ) of
        ( Just signature, declaration ) ->
            case signature.typeAnnotation of
                (TypeAnnotation.Typed _ (_, [], "Codec" ) [  (TypeAnnotation.Typed _ codecType []) ]) ->
                    case declaration.expression of
                        Expression.Application _ ((Expression.FunctionOrValue _ [ "Debug" ] "todo") :: _) ->
                            case ModuleNameLookupTable.moduleNameFor context.lookupTable codecType of
                                Just actualCodecType ->
                                    ...

                                Nothing ->
                                    []

                        _ ->
                            []

                _ ->
                    []

        _ ->
            []
jfmengels commented 3 years ago

That's an interesting suggestion.

type alias FunctionImplementation =
    { name : Node String
    , arguments : List (Node Pattern)
    , expression : Node Expression
    }

How would you transform this type above with that idea? Something like the following?

type alias FunctionImplementation =
    { name : ( Range, String )
    , arguments : List Pattern -- The pattern would contain the range information?
    , expression : Expression
    }

It may become a bit tricky to get the range of something then though. Imagine you want to know the range of the condition in an if expression. Currently you'd do

case Node.value node of
  Expression.IfBlock condition left right ->
    Node.range condition

but with your proposal, you'd have to pattern match all the possible variants to get it

case node of
  Expression.IfBlock ifRange condition left right ->
    case condition of
      Integer range _ -> range
      Hex range _ -> range
      -- ...

I guess we'd provide a helper function for that, but you'd need to have multiple functions expressionRange, declarationRange, ... to cover all of those. I'm thinking that could be annoying somewhere too.

MartinSStewart commented 3 years ago

In some cases such as the name field or the condition parameter, it might make more sense to place the range in the parent type. So

type alias FunctionImplementation =
    { name : String 
    , nameRange : Range
    , arguments : List Pattern -- The pattern would contain the range information? <- that's correct
    , expression : Expression
    }

and

case node of
  Expression.IfBlock ifRange conditionRange condition left right ->

This might still be more difficult to get the range from. But my experience has been that I seldom need to get the range compared to other data in the AST so the ease of navigating the data structure would make this worthwhile.

Another thing I didn't think of in my first post is that removing nodes also makes it easier to create AST's for code generation as there is much less nesting. I believe this difficulty is why https://package.elm-lang.org/packages/the-sett/elm-syntax-dsl/latest/ exists (essentially it's Elm syntax's code generation but with all the nodes stripped out)

Edit: I take back what I said about the condition parameter in IfBlock, I didn't realize that was an expression. In that case I think it makes most sense for all the expressions variants to have a range value as the first parameter.

MartinSStewart commented 3 years ago

Another possible improvement would be to make the Range value a type variable instead. So something like this

type Expression a
    = UnitExpr a
    | Literal a String
    | ...

When parsing, a could be Range, and when you're using Elm.Writer, it can be () since you don't care about Ranges at that point.

I guess there's a risk that the user needing to add a type variable to a bunch of type signatures out weighs the effort spared when doing code generation but I think it's worth trying out.

Viir commented 3 years ago

As far as I understand, the type variable also would help with equality checks. After reading expressions from text, applying a mapping of the range type to () could enable checking for equality. Thinking about it now, that would require a custom mapping function for each of the types.

I encountered equality checks when writing tests for emission scenarios. Tests for the emit stage contain expectations about emitted syntax. I described the expectations using strings so far, but I thought it might be nicer to use the syntax types here. I haven't tried it so far, though, so not sure.