Closed KaranAhlawat closed 1 month ago
If possible, I'd like to contribute a fix (if it's technically feasible), but I really have no clue where to start looking or how the debugging process would work.
Hi! Thank you for the nicely detailed error report and that you're interested in fixing it!
The best bet at debugging the parser is to use the npm run debug $FILE
command, which will print every step of the parser.
To effectively use it, however, requires an overview of the parsing process.
Here, it is important to know there are two steps to the parsing process; the external and internal.
The external scanner is defined in src/scanner.c
and is a hand-crafted parser in c. The external scanner is alwaly run first. It read one character at a time, and decides if want to continue reading, return a token, or give up.
If the external scanner does not return a token, then the internal scanner is run. The internal scanner is the parser that is generated from the grammar.js
file.
If we run npm run debug
on the same you provided, we will at some point see the following in the output:
process version:0, version_count:1, state:1541, row:1, col:16
lex_external state:4, row:1, column:16
skip character:10
skip character:10
skip character:' '
skip character:' '
skip character:' '
skip character:' '
consume character:'t'
lexed_lookahead sym:_newline, size:0
reduce sym:_string_literal, child_count:3
reduce sym:string, child_count:1
reduce sym:const, child_count:1
reduce sym:_expression, child_count:1
reduce sym:application_expression, child_count:2
reduce sym:_expression, child_count:1
shift state:640
process version:0, version_count:1, state:640, row:1, col:16
lex_external state:2, row:1, column:16
skip character:10
skip character:10
skip character:' '
skip character:' '
skip character:' '
skip character:' '
lex_internal state:347, row:1, column:16
skip character:10
skip character:10
skip character:' '
skip character:' '
skip character:' '
skip character:' '
consume character:'t'
consume character:'y'
consume character:'p'
consume character:'e'
consume character:'t'
consume character:'y'
consume character:'p'
consume character:'e'
lexed_lookahead sym:identifier, size:10
This is the parsing steps immediately following printfn "Hi"
.
Where we can tell from:
reduce sym:_string_literal, child_count:3
reduce sym:string, child_count:1
reduce sym:const, child_count:1
reduce sym:_expression, child_count:1
reduce sym:application_expression, child_count:2
reduce sym:_expression, child_count:1
that we correctly parsed printfn "Hi"
into an application_expression
.
We can then tell by the first lines:
lex_external state:4, row:1, column:16
skip character:10
skip character:10
skip character:' '
skip character:' '
skip character:' '
skip character:' '
consume character:'t'
lexed_lookahead sym:_newline, size:0
That the external scanner consumed the newline characters and spaces up until the t
of the type
keyword and that resulted in a _newline
token.
The _newline
token is used by the sequential_expression in the grammar to check that two expressions start at the same level of indentation.
What we really wanted to happen at this point, is to parse the type
as the start of a type_definition.
Both sequential_expression
and type_definition
are valid nodes in the module_elem node, which is the top-level node containing all elements of a module.
However, since the external scanner produced a _newline
token, and neither module_elem
nor type_definition
accepts a _newline
token the parser is forced to try and parse the rest of the file as the child node of a sequential_expression
.
My idea would to handle _newline
tokens in the module definition akin to sequential_expression
. i.e. every sub-node but the first should have a _newline
token before it.
Then there will the probable precedence issues that will appear between _module_elem
and sequential_expression
that would have to be fixed to make it all correct.
Describe the bug Incorrect parsing of type definition when preceded by an application expression in a module
To Reproduce
Using this code snippet, the tree-produced is
Expected behaviour Expected the type to be parsed as normally, which would be something like this
Screenshots With the application expression
Without the application expression
Environment (please complete the following information):