BRANCH smir-syntax: `Local` / `LocalToken` not parsing

dkcumming commented 1 year ago

Description

Local is not recognised in function signature when attempting to parse any function signature.

INFO 2023-09-05 14:55:43,510 pyk.ktool.kprint - Running: kast /tmp/tmpus8i2vey --definition /home/daniel/.kbuild/mir-semantics/5a0ac5c/target/v6.0.82/llvm --input program --output pretty --sort Mir
[Error] Inner Parser: Parse error: unexpected token '_1' following token '('.
        Source(/tmp/tmpus8i2vey)
        Location(3,13,3,15)
        3 |     fn sum_to_n(_1: usize) -> usize {

At first inspection this appears to be because of the definition of LocalToken as "_" Int [token], but changing for previous definition of r"_[0-9]+" [token] produces the same error. The error can be trivially avoided by hard coding "_1" instead of Local, that is to say that this parses as a valid function signature:

"fn" DefPath "(" "_1" ":" Type ")" "->" Type

Reproducer

cd kmir
make build
source set-env.sh
poetry shell
cd src/tests/integration/handwritten-rust/
kmir parse --verbose --debug --output pretty test-sum-to-n.mir

k versions 6.0.0 and 6.0.82 were tested and had same results.

Solution

Unsure at this stage, more investigation required.

dkcumming commented 1 year ago

It is also worth mentioning that if the first _1 is hardcoded in a string, the next fails. So it isn't JUST in the function signatures that the problem arises in case that was unclear.

yanliu18 commented 1 year ago

Tried around but no solution found. Will need expert's help.

Robertorosmaninho commented 1 year ago

Hey guys, could you please try to use r"_[0-9]+" [prefer, token, prec(2)] as defined in Integer Syntax?

I tried it here, and it parses _1 as expected, although I got a new error:

[Error] Inner Parser: Parse error: unexpected token 'scope' following token
';'.
    Source(/tmp/tmpiohyjr9p)
    Location(11,5,11,10)
    11 |        scope 1 {

But I wonder if it's related to the same issue as this or if it's something completely different!

radumereuta commented 1 year ago

I tried it and it passed for me:

radu@radu:~/work/mir-semantics/kmir$ . /home/radu/.cache/pypoetry/virtualenvs/kmir-meUFJs27-py3.10/bin/activate
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir$ cd src/tests/integration/
proofs/    test-data/ 
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir$ cd src/tests/integration/
proofs/    test-data/ 
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir$ cd src/tests/integration/test-data/
compiletest-rs/   handwritten-mir/  handwritten-rust/ 
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir$ cd src/tests/integration/test-data/
compiletest-rs/   handwritten-mir/  handwritten-rust/ 
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir$ cd src/tests/integration/test-data/handwritten-rust/
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir/src/tests/integration/test-data/handwritten-rust$ kmir parse --verbose --debug --output pretty test-sum-to-n.mir
INFO 2023-09-05 17:56:08,871 pyk.ktool.kprint - Running: kast /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm/parser_Mir_MIR-PARSER-SYNTAX --definition /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm --module MIR-PARSER-SYNTAX --sort Mir --gen-glr-parser
INFO 2023-09-05 17:56:10,911 pyk.ktool.kprint - Completed in 2.040s with status 0: kast /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm/parser_Mir_MIR-PARSER-SYNTAX --definition /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm --module MIR-PARSER-SYNTAX --sort Mir --gen-glr-parser
INFO 2023-09-05 17:56:10,912 pyk.ktool.kprint - Running: kast /tmp/tmpb_po2pip --definition /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm --input program --output pretty --sort Mir
INFO 2023-09-05 17:56:12,210 pyk.ktool.kprint - Completed in 1.297s with status 0: kast /tmp/tmpb_po2pip --definition /home/radu/.kbuild/mir-semantics/8bbd6b6/target/v6.0.81-0-g4ca5636989/llvm --input program --output pretty --sort Mir
fn sum_to_n :: .FunctionPath ( _1 : usize  , .ParameterList ) -> usize  { debug n => _1 ;  .DebugList let mut _0 : usize  ;  let mut _2 : usize  ;  let mut _4 : bool  ;  let mut _5 : usize  ;  let mut _6 : usize  ;  let mut _7 : usize  ;  .BindingList scope 1 { debug sum => _2 ;  .DebugList let mut _3 : usize  ;  .BindingList scope 2 { debug counter => _3 ;  .DebugList .BindingList .ScopeList }  .ScopeList }  .ScopeList bb0  : { _2 = const 0_usize ;  _3 = _1 ;  .Statements goto -> bb1  ; }  bb1  : { _5 = _3 ;  _4 = Gt ( move _5 , const 0_usize ) ;  .Statements switchInt ( move _4 ) -> [ 0 : bb3  , .SwitchTargets , otherwise : bb2  ] ; }  bb2  : { _6 = _3 ;  _2 = Add ( _2 , move _6 ) ;  _7 = _3 ;  _3 = Sub ( move _7 , const 1_usize ) ;  .Statements goto -> bb1  ; }  bb3  : { _0 = _2 ;  .Statements return ; }  .BasicBlockList }  fn test_sum_to_n :: .FunctionPath ( .ParameterList ) -> ( ) { .DebugList let mut _0 : ( ) ;  let  _1 : usize  ;  let mut _4 : usize  ;  let mut _5 : bool  ;  let mut _6 : ! ;  .BindingList scope 1 { debug n => _1 ;  .DebugList let  _2 : usize  ;  .BindingList scope 2 { debug golden => _2 ;  .DebugList let  _3 : bool  ;  .BindingList scope 3 { debug sucess => _3 ;  .DebugList .BindingList .ScopeList }  .ScopeList }  .ScopeList }  .ScopeList bb0  : { _1 = const 10_usize ;  _2 = const 55_usize ;  .Statements _4 = sum_to_n :: .ExpressionPathList ( _1 , .ArgumentList ) -> bb1  ; }  bb1  : { _3 = Eq ( move _4 , _2 ) ;  _5 = Not ( _3 ) ;  .Statements switchInt ( move _5 ) -> [ 0 : bb3  , .SwitchTargets , otherwise : bb2  ] ; }  bb2  : { .Statements _6 = core :: panicking :: panic :: .ExpressionPathList ( const "assertion failed: sucess" , .ArgumentList ) ; }  bb3  : { .Statements return ; }  .BasicBlockList }  fn main :: .FunctionPath ( .ParameterList ) -> ( ) { .DebugList let mut _0 : ( ) ;  let  _1 : ( ) ;  .BindingList .ScopeList bb0  : { .Statements _1 = test_sum_to_n :: .ExpressionPathList ( .ArgumentList ) -> bb1  ; }  bb1  : { .Statements return ; }  .BasicBlockList }  .Mir

(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir/src/tests/integration/test-data/handwritten-rust$ git status
On branch smir-syntax
nothing to commit, working tree clean
(kmir-py3.10) radu@radu:~/work/mir-semantics/kmir/src/tests/integration/test-data/handwritten-rust$

yanliu18 commented 1 year ago

@radumereuta Yes, the same implementation passes on the master branch, but fails on the defector/smir-syntax branch (there is more structures defined).

Couldn't figure out why.

yanliu18 commented 1 year ago

Hey guys, could you please try to use r"_[0-9]+" [prefer, token, prec(2)] as defined in Integer Syntax?

I tried it here, and it parses _1 as expected, although I got a new error:
[Error] Inner Parser: Parse error: unexpected token 'scope' following token
';'.
  Source(/tmp/tmpiohyjr9p)
  Location(11,5,11,10)
  11 |        scope 1 {
But I wonder if it's related to the same issue as this or if it's something completely different!

Thank you, Robert, I couldn't think of a good reason why it is fail. Master branch has the same implementation but no problem there.

On the refactor branch, more identifiers patterns are defined, I am suspecting there is a conflict or ambiguity.

It would be helpful to know how tokens are recognised and matched to patterns. Will there be more detailed documentation? If not, pointers to the implementation in K would also be helpful. @radumereuta please look here too.

Robertorosmaninho commented 1 year ago

Ping @radumereuta

radumereuta commented 1 year ago

I'm working on this https://github.com/runtimeverification/k/issues/3647 to make it easier to debug such problems. I met with Daniel last week for a short debug session, but the definition is very large, and hard to figure out where the problem is coming from. One thing that I know for sure is that there are some productions that are too general and match too much.

virgil-serbanuta commented 1 year ago

FWIW, I tried to make this work, and I may have succeeded, see the syntax-demo branch. My guess is that someone already fixed at least some of the the issues mentioned above.

Also, FWIW, I think that having "()" as a token is questionable (e.g. in the syntax FnSig ::= ... | "fn" DefPath "()" "->" Type declaration), but without it there is a weird ambiguity related to empty list parsing (I get an error about the empty list being parsable as either .Stuff or as _,_Stuff(.Stuff, .Stuff)), so I guess that's the reason for using a token there. I'm guessing that's a K bug, most likely empty list parsing should just work, and the declaration should be syntax FnSig ::= ... | "fn" DefPath "(" Args ")" "->" Type

radumereuta commented 1 year ago

Empty lists parsing should just work in programs. In the semantics you need to specify some non whitespace characters. That's why you have .List instead of nothing. Otherwise the parser is going to fail unpredictably.

So, @virgil-serbanuta what production are you talking about?

virgil-serbanuta commented 1 year ago

@radumereuta I'm sorry, I was wrong. The error was:

[Error] Inner Parser: Parsing ambiguity.
1: syntax ArgList ::= Operand "," ArgList [userList(*)]
    `_,__MIR-SYNTAX_ArgList_Operand_ArgList`(`.List{"_::__MIR-SYNTAX_DefPath_Identifier_DefPath"}_DefPath`(.KList),`.List{"_,__MIR-SYNTAX_ArgList_Operand_ArgList"}_ArgList`(.KList))
2: syntax ArgList ::= ".ArgList" [klabel(.List{"_,__MIR-SYNTAX"}), userList(*)]
    `.List{"_,__MIR-SYNTAX_ArgList_Operand_ArgList"}_ArgList`(.KList)
        Source(/tmp/tmpawfk2g_r)
        Location(3,28,3,29)
        3 |             _1 = test_sum_to_n() -> bb1;

so the first option was actually a one-element list made of an empty DefPath list.

If I force DefPaths to be non-empty (see the syntax-demo-1 branch, also needs a few similar changes), the semantics parses the file above.

radumereuta commented 1 year ago

Which file is that? The source in your error message points to a temp file. Probably generated or copied by your build system.

virgil-serbanuta commented 1 year ago

By "The file above" I meant the file in the initial comment (test-sum-to-n.mir).

dkcumming commented 1 year ago

The production of syntax ... ::= "()" is a problem. Jost pointed out that it violates longest match principle

virgil-serbanuta commented 1 year ago

The production of syntax ... ::= "()" is a problem. Jost pointed out that it violates longest match principle

Yes, that's what I meant above by "having "()" as a token is questionable". It was preventing a few other errors from showing up, though. Anyway, in the syntax-demo-1 branch I removed that token.

runtimeverification / mir-semantics