Closed lemmy closed 4 years ago
Hallo! Can I do this one?
Sure, please go ahead and open a PR.
Thanks! Here it is. https://github.com/tlaplus/PlusPy/pull/8
Edit: WIP
I have pushed two PRs with two different solutions.
I am new to this stuff and do not have wisdom to choose the right solution. Please, help. :-)
Discard preamble after lexing https://github.com/tlaplus/PlusPy/pull/8/files
Add optional preamble matching rule in the parser https://github.com/tlaplus/PlusPy/pull/9/files
PS: I am happy to discard them both if they are wrong. Apologies for the spam. 🙏
Please explain the difference between the two PRs in more detail.
Hallo!
As usual, the lexer scans the string, produces tokens, and feeds them to the parser.
Parser expects the first four tokens to be tok("----")
, tok("MODULE")
, Name()
and tok("----")
(the module start tokens).
When there is a preamble, the parser fails with ..."tok: no match with '----'"...
, because the lexer's output has words from the preamble as the first tokens.
This gives us two options.
The first PR removes all the tokens before the module start tokens at the end of lexing. This will satisfy the parser's expectations.
The lexer skips over the comments. We can treat the preamble like comments because it is not useful later. (Q1) Why forward its tokens to the parser and complicate the code there?
The second PR changes the parser's expectations. It makes parser expect optional preamble tokens before the module start tokens. It does exactly the same thing as the other PR (to discard any tokens before the module start tokens) but in the parsing phase instead of doing it at end of the lexing phase.
(Q2) Why should we give any more intelligence to the lexer other than what it needs to just make the tokens?
Because of the questions Q1 and Q2, I do not know how to choose.
Hope this helps. Thanks! 🙏
Do you know how the rest of the PlusPy lexer/parser handle comments that are inside the TLA+ module?
Lexer ignores all the comments inside or outside of the module so the parser doesn't have to deal with them at all.
In lexer code,
For example, comments in the foo.tla
don't have any tokens in the lexer output.
foo.tla
\* No tokens for this
---- MODULE Foo -----
\* No tokens for this
\* Not a token sir!
VARIABLE x
(* This is a box comment
and it will be ignored too.
*)
(**********************************************************************)
(* This too should be ingored by the lexer. *)
(**********************************************************************)
TypeOK ==
/\ x < 10
Init ==
/\ x = 0
\* and this too don't generate no tokens
Next ==
/\ x' = x + 1
Spec == Init /\ [][TypeOK /\ Next]_<<x>>
====
Lexer output:
[('----', ('./tests/foo.tla', 4), 1, True), ('MODULE', ('./tests/foo.tla', 4), 6, False), ('Foo', ('./tests/foo.tla', 4), 13, False), ('----', ('./tests/foo.tla', 4), 17, False), ('VARIABLE', ('./tests/foo.tla', 9), 1, True), ('x', ('./tests/foo.tla', 9), 10, False), ('TypeOK', ('./tests/foo.tla', 19), 1, True), ('==', ('./tests/foo.tla', 19), 8, False), ('/\\', ('./tests/foo.tla', 20), 5, True), ('x', ('./tests/foo.tla', 20), 8, False), ('<', ('./tests/foo.tla', 20), 10, False), ('10', ('./tests/foo.tla', 20), 12, False), ('Init', ('./tests/foo.tla', 22), 1, True), ('==', ('./tests/foo.tla', 22), 6, False), ('/\\', ('./tests/foo.tla', 23), 5, True), ('x', ('./tests/foo.tla', 23), 8, False), ('=', ('./tests/foo.tla', 23), 10, False), ('0', ('./tests/foo.tla', 23), 12, False), ('Next', ('./tests/foo.tla', 28), 1, True), ('==', ('./tests/foo.tla', 28), 6, False), ('/\\', ('./tests/foo.tla', 29), 5, True), ('x', ('./tests/foo.tla', 29), 8, False), ("'", ('./tests/foo.tla', 29), 9, False), ('=', ('./tests/foo.tla', 29), 11, False), ('x', ('./tests/foo.tla', 29), 13, False), ('+', ('./tests/foo.tla', 29), 15, False), ('1', ('./tests/foo.tla', 29), 17, False), ('Spec', ('./tests/foo.tla', 31), 1, True), ('==', ('./tests/foo.tla', 31), 6, False), ('Init', ('./tests/foo.tla', 31), 9, False), ('/\\', ('./tests/foo.tla', 31), 14, False), ('[]', ('./tests/foo.tla', 31), 17, False), ('[', ('./tests/foo.tla', 31), 19, False), ('TypeOK', ('./tests/foo.tla', 31), 20, False), ('/\\', ('./tests/foo.tla', 31), 27, False), ('Next', ('./tests/foo.tla', 31), 30, False), (']_', ('./tests/foo.tla', 31), 34, False), ('<<', ('./tests/foo.tla', 31), 36, False), ('x', ('./tests/foo.tla', 31), 38, False), ('>>', ('./tests/foo.tla', 31), 39, False), ('====', ('./tests/foo.tla', 32), 1, True)]
For consistency, it seems we should go with your first PR. Please capture our discussion and reasoning above in code comments so that our future self will know why the change was made?
Thanks for your contribution!