stadelmanma / tree-sitter-fortran

Fortran grammar for tree-sitter
MIT License
30 stars 15 forks source link

`variable_modification` clashes with `assignment_statement` if left-hand side name is one of `type_qualifier` #103

Open ZedThree opened 1 week ago

ZedThree commented 1 week ago

Example snippet:

program test
  integer :: len
  len = 5
end program test

gets parsed as:

(translation_unit [0, 0] - [4, 0]
  (program [0, 0] - [4, 0]
    (program_statement [0, 0] - [0, 12]
      (name [0, 8] - [0, 12]))
    (variable_declaration [1, 2] - [1, 16]
      type: (intrinsic_type [1, 2] - [1, 9])
      declarator: (identifier [1, 13] - [1, 16]))
    (variable_modification [2, 2] - [2, 9]
      (type_qualifier [2, 2] - [2, 5])
      declarator: (init_declarator [2, 5] - [2, 9]
        left: (identifier [2, 5] - [2, 5])
        right: (number_literal [2, 8] - [2, 9])))
    (end_program_statement [3, 0] - [4, 0]
      (name [3, 12] - [3, 16]))))
test.f90      0.33 ms         174 bytes/ms (MISSING "end" [2, 5] - [2, 5])

Rename len to foo as we get the correct:

(translation_unit [0, 0] - [4, 0]
  (program [0, 0] - [4, 0]
    (program_statement [0, 0] - [0, 12]
      (name [0, 8] - [0, 12]))
    (variable_declaration [1, 2] - [1, 16]
      type: (intrinsic_type [1, 2] - [1, 9])
      declarator: (identifier [1, 13] - [1, 16]))
    (assignment_statement [2, 2] - [2, 9]
      left: (identifier [2, 2] - [2, 5])
      right: (number_literal [2, 8] - [2, 9]))
    (end_program_statement [3, 0] - [4, 0]
      (name [3, 12] - [3, 16]))))

Two potential fixes:

ZedThree commented 4 days ago

A couple more details:

  1. This happens when a variable has the same name as a type_qualifier
  2. The assignment has to happen immediately after the specification part. Anything else inbetween, such as a comment, and this isn't triggered

104 works around this by treating kind and len slightly differently, but this means it only fixes it for the case where variables are called either kind or len. Luckily, with tree-sitter, we can use a query to find out how many projects actually use one of these attributes as an identifier!

((variable_declaration
  (identifier) @name
    (#any-of? @name
      "abstract"
      ...
      )))

Searching everything:

tree-sitter query keyword_identifiers.query \
  $(fd -e f90 .) \
  | grep "pattern:" -A1 \
  | grep -oE "text:.*" \
  | sort | uniq -c | sort -h

And from our corpus of 60 projects, 2.5M lines, these are total number of matches:

      1 text: `external`
      1 text: `sequence`
      1 text: `static`
      3 text: `parameter`
      3 text: `shared`
      4 text: `constant`
      5 text: `public`
      8 text: `save`
     10 text: `pointer`
     11 text: `texture`
     12 text: `contiguous`
     25 text: `allocatable`
     30 text: `optional`
     35 text: `kind`
     42 text: `target`
     69 text: `device`
    218 text: `len`
   1066 text: `value`

(some of these are false positives from files with preprocessor macros confusing tree-sitter)

So looks like it really needs fixing for value in particular. I think if you name your variable optional or allocatable, you should not be surprised at weirdness!