tlaplus / PlusPy

Python interpreter for TLA+ specifications
MIT License
113 stars 11 forks source link

Parsing error on module preamble #7

Closed lemmy closed 4 years ago

lemmy commented 4 years ago

This is some prose preceding the module definition.

\* WORKAROUND: Comment prose before the module definition.

---- MODULE AsyncGameOfLifeDistributed -----
VARIABLE x

Spec == x = TRUE /\ [][x'\in BOOLEAN]_x
====
markus@avocado:~/src/TLA/_specs/models/realworld/AsyncGameOfLife(master)$ pluspy -c 1 AsyncGameOfLifeDistributed
Parsing failed ['GModule', 'Concat', ("tok: no match with '----'", '* (AsyncGameOfLifeDistributed.tla:1:1)')]
[]
At position ('*', ('AsyncGameOfLifeDistributed.tla', 1), 1, True)
Traceback (most recent call last):
  File "/home/markus/src/TLA/_community/PlusPy/pluspy.py", line 4286, in <module>
    main()
  File "/home/markus/src/TLA/_community/PlusPy/pluspy.py", line 4251, in main
    pp = PlusPy(args[0], seed=seed)
  File "/home/markus/src/TLA/_community/PlusPy/pluspy.py", line 3587, in __init__
    raise PlusPyError("can't load " + file)
__main__.PlusPyError: PlusPyError: can't load AsyncGameOfLifeDistributed.tla
PratikDeoghare commented 4 years ago

Hallo! Can I do this one?

lemmy commented 4 years ago

Sure, please go ahead and open a PR.

PratikDeoghare commented 4 years ago

Thanks! Here it is. https://github.com/tlaplus/PlusPy/pull/8

Edit: WIP

PratikDeoghare commented 4 years ago

I have pushed two PRs with two different solutions.

I am new to this stuff and do not have wisdom to choose the right solution. Please, help. :-)

  1. Discard preamble after lexing https://github.com/tlaplus/PlusPy/pull/8/files

  2. Add optional preamble matching rule in the parser https://github.com/tlaplus/PlusPy/pull/9/files

PS: I am happy to discard them both if they are wrong. Apologies for the spam. 🙏

lemmy commented 4 years ago

Please explain the difference between the two PRs in more detail.

PratikDeoghare commented 4 years ago

Hallo!

As usual, the lexer scans the string, produces tokens, and feeds them to the parser.

Parser expects the first four tokens to be tok("----"), tok("MODULE"), Name() and tok("----") (the module start tokens).

When there is a preamble, the parser fails with ..."tok: no match with '----'"..., because the lexer's output has words from the preamble as the first tokens.

This gives us two options.

  1. Fix the lexer's output.

The first PR removes all the tokens before the module start tokens at the end of lexing. This will satisfy the parser's expectations.

The lexer skips over the comments. We can treat the preamble like comments because it is not useful later. (Q1) Why forward its tokens to the parser and complicate the code there?

  1. Fix the parser's expectations.

The second PR changes the parser's expectations. It makes parser expect optional preamble tokens before the module start tokens. It does exactly the same thing as the other PR (to discard any tokens before the module start tokens) but in the parsing phase instead of doing it at end of the lexing phase.

(Q2) Why should we give any more intelligence to the lexer other than what it needs to just make the tokens?

Because of the questions Q1 and Q2, I do not know how to choose.

Hope this helps. Thanks! 🙏

lemmy commented 4 years ago

Do you know how the rest of the PlusPy lexer/parser handle comments that are inside the TLA+ module?

PratikDeoghare commented 4 years ago

Lexer ignores all the comments inside or outside of the module so the parser doesn't have to deal with them at all.

In lexer code,

For example, comments in the foo.tla don't have any tokens in the lexer output.

foo.tla


\* No tokens for this

---- MODULE Foo -----

\* No tokens for this
\* Not a token sir!

VARIABLE x

     (* This is a box comment
     and it will be ignored too.
     *)

     (**********************************************************************)
     (* This too should be ingored by the lexer.                           *)
     (**********************************************************************)

TypeOK ==
    /\ x < 10

Init ==
    /\ x = 0

\* and this too don't generate no tokens

Next ==
    /\ x' = x + 1

Spec == Init /\ [][TypeOK /\ Next]_<<x>>
====

Lexer output:

[('----', ('./tests/foo.tla', 4), 1, True), ('MODULE', ('./tests/foo.tla', 4), 6, False), ('Foo', ('./tests/foo.tla', 4), 13, False), ('----', ('./tests/foo.tla', 4), 17, False), ('VARIABLE', ('./tests/foo.tla', 9), 1, True), ('x', ('./tests/foo.tla', 9), 10, False), ('TypeOK', ('./tests/foo.tla', 19), 1, True), ('==', ('./tests/foo.tla', 19), 8, False), ('/\\', ('./tests/foo.tla', 20), 5, True), ('x', ('./tests/foo.tla', 20), 8, False), ('<', ('./tests/foo.tla', 20), 10, False), ('10', ('./tests/foo.tla', 20), 12, False), ('Init', ('./tests/foo.tla', 22), 1, True), ('==', ('./tests/foo.tla', 22), 6, False), ('/\\', ('./tests/foo.tla', 23), 5, True), ('x', ('./tests/foo.tla', 23), 8, False), ('=', ('./tests/foo.tla', 23), 10, False), ('0', ('./tests/foo.tla', 23), 12, False), ('Next', ('./tests/foo.tla', 28), 1, True), ('==', ('./tests/foo.tla', 28), 6, False), ('/\\', ('./tests/foo.tla', 29), 5, True), ('x', ('./tests/foo.tla', 29), 8, False), ("'", ('./tests/foo.tla', 29), 9, False), ('=', ('./tests/foo.tla', 29), 11, False), ('x', ('./tests/foo.tla', 29), 13, False), ('+', ('./tests/foo.tla', 29), 15, False), ('1', ('./tests/foo.tla', 29), 17, False), ('Spec', ('./tests/foo.tla', 31), 1, True), ('==', ('./tests/foo.tla', 31), 6, False), ('Init', ('./tests/foo.tla', 31), 9, False), ('/\\', ('./tests/foo.tla', 31), 14, False), ('[]', ('./tests/foo.tla', 31), 17, False), ('[', ('./tests/foo.tla', 31), 19, False), ('TypeOK', ('./tests/foo.tla', 31), 20, False), ('/\\', ('./tests/foo.tla', 31), 27, False), ('Next', ('./tests/foo.tla', 31), 30, False), (']_', ('./tests/foo.tla', 31), 34, False), ('<<', ('./tests/foo.tla', 31), 36, False), ('x', ('./tests/foo.tla', 31), 38, False), ('>>', ('./tests/foo.tla', 31), 39, False), ('====', ('./tests/foo.tla', 32), 1, True)]

lemmy commented 4 years ago

For consistency, it seems we should go with your first PR. Please capture our discussion and reasoning above in code comments so that our future self will know why the change was made?

lemmy commented 4 years ago

Thanks for your contribution!