stadelmanma / tree-sitter-fortran

Fortran grammar for tree-sitter
MIT License
30 stars 15 forks source link

Parse error when whitespace or comment follows line continuation #73

Closed ZedThree closed 1 year ago

ZedThree commented 1 year ago

If there is any whitespace or comments following a line continuation, then there's an error:

program test
    write(*, "('Testing line continuation')", &
       &  advance='no', &   ! comment
          iostat=istat)
end program

Result:

- Actual
+ Expected

    (translation_unit
      (program
        (program_statement
          (name))
        (write_statement
          (unit_identifier)
          (format_identifier
            (string_literal))
          (keyword_argument
            (identifier)
            (string_literal))
-          (ERROR
-            (UNEXPECTED '&')
-            (comment)))
+          (keyword_argument
+           (identifier)
+           (identifier)))
        (end_program_statement)))

I guess this needs to be fixed in scanner.cc, but I don't quite understand how to fix it. I thought the following would be sufficient, at least when it's just whitespace, but it gives (UNEXPECTED '&'):

@@ -173,6 +173,10 @@ struct Scanner {
         }
         advance(lexer);

+        while (iswspace(lexer->lookahead)) {
+          advance(lexer);
+        }
+
         // Consume end of line characters, we allow '\n', '\r\n' and
         // '\r' to cover unix, MSDOS and old style Macintosh
         if (lexer->lookahead == '\r') {
stadelmanma commented 1 year ago

@ZedThree I'm not sure exactly how to fix it either. It's been so long since I've been deep into the logic of how tree-sitter works I don't remember how scanner.cc factors in with the regular parsing flow (i.e. does it check the grammar.js rules when advance is called or not until it returns).

UPDATE: Reading this, https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners it sounds like advance just moves the lexer forward a character so the other rules aren't considered yet. This will take some thought, on the bright side we might be able to it #61 at the same time.

ZedThree commented 1 year ago

Here's another edge-case for line continuations from my Fortran teaching course:

    print*, "You picked a number between twenty and fifty,&
            & excluding forty-two"

& can appear inside string literals, in which case I'm reasonably certain it must appear as the first character (not column) on the next line.

I've read that tree-sitter doesn't aim for "type-II correctness", which I interpret to mean: "tree-sitter should be able to parse all valid programs, but not necessarily reject all invalid programs". So that might give us some latitude to not worry so much about missing & on the second line