[Bug] [kompile] - Parsing token vs parsing variable name

K Version

bd04f5fd32

Description

Tokens declared in one sort conflict with variables in another sort.

Input Files

File test.k:

module TEST
    imports INT

    configuration <k> $PGM:Contract </k>

    rule (A +Int B) +Int C:Int => A +Int (B +Int C:Int) [simplification]

    syntax Contract ::= "C"
endmodule

Reproduction Steps

Kompile the definition:

$ kompile --backend haskell test.k --output-definition test-kompiled                                                                                                                                                                                                              
[Error] Inner Parser: Parse error: unexpected token 'C' following token '+Int'.                                                                                                                                                                                                   
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1097,25,1097,26)                                                                                                                                                                                                                                                 
        1097 |    rule I1 +Int (I2 +Int C) => (I1 +Int I2) +Int C [concrete(I1, I2),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                                                                                      
             .                          ^                                                                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '-Int'.                                                                                                                                                                                                   
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1098,25,1098,26)                                                                                                                                                                                                                                                 
        1098 |    rule I1 +Int (I2 -Int C) => (I1 +Int I2) -Int C [concrete(I1, I2),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                                                                                      
             .                          ^                                                                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '+Int'.                                                                                                                                                                                                   
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1100,25,1100,26)                                                                                                                                                                                                                                                 
        1100 |    rule I1 -Int (I2 +Int C) => (I1 -Int I2) -Int C [concrete(I1, I2),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                                                                                      
             .                          ^                                                                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '-Int'.                                                                                                                                                                                                   
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1101,25,1101,26)                                                                                                                                                                                                                                                 
        1101 |    rule I1 -Int (I2 -Int C) => (I1 -Int I2) +Int C [concrete(I1, I2),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                                                                                      
             .                          ^                                                                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '('.                                                                                                                                                                                                      
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1102,9,1102,10)                                                                                                                                                                                                                                                  
        1102 |    rule (C -Int I2) -Int I3 => C -Int (I2 +Int I3) [concrete(I2, I3),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                                                                                      
             .          ^                                                                                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '&Int'.                                                                                                                                                                                                   
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                                                                             
        Location(1104,25,1104,26)                                                                                                                                                                                                                                                 
        1104 |    rule I1 &Int (I2 &Int C) => (I1 &Int I2) &Int C [concrete(I1, I2),                                                                                                                                                                                              
symbolic(C), simplification]                                                                                                                                                                                         
             .                          ^                                                                                                                                                                            
[Error] Inner Parser: Parse error: unexpected token 'C' following token '#if'.                                                                                                                                       
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                
        Location(2128,12,2128,13)                                                                                                                                                                                    
        2128 |    rule #if C:Bool #then B1::K #else _ #fi => B1 requires C                                                                                                                                           
             .             ^                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '#if'.                                                                                                                                       
        Source(/home/dev/src/k/k-distribution/target/release/k/include/kframework/builtin/domains.md)                                                                                                                
        Location(2129,12,2129,13)                                                                         
        2129 |    rule #if C:Bool #then _ #else B2::K #fi => B2 requires notBool C                                                                                                                                   
             .             ^                                                                                                                                                                                         
[Error] Inner Parser: Parse error: unexpected token 'C' following token '+Int'.                            
        Source(/home/dev/src/k/test.k)                                                                                                                                                                               
        Location(6,26,6,27)                                                                               
        6 |         rule (A +Int B) +Int C => A +Int (B +Int C) [simplification]
          .                              ^                                                                                                                                                                           
[Error] Compiler: Had 9 parsing errors.

Expected Behavior

I would like this case to be handled. The variable C in these rules cannot be of sort Contract, so we should be able to figure this out. Annotating the variable with it's sort (as C:Int), does not help.

I'm not really sure how we could possibly fix this without nuking performance, doing something really ad hoc, or breaking existing definitions.

The issue happens at the level of the scanner, which has absolutely no knowledge of sorts and thus can't make the kind of distinction you are talking about. Maybe we could tweak something so that if the parser expects a regex token, it can accept a non regex token that has a string value that matches the regex as well somehow, but that might be pretty difficult because Java doesn't really have knowledge of the regex language used by Bison. It's perhaps worth a try, but I don't want you to get your hopes up too much.

Also, there's no principled way to distinguish between sorts that are disjoint versus sorts that overlap for something like this, so if you declared C as sort Int in the example above, you would end up with an ambiguity, something that very likely would make a number of existing definitions not parse. Theoretically, you could detect which sorts are in the same connected component and do this only for sorts that aren't, but I would prefer to avoid that because it has absolutely no basis in parsing theory; you would be generating a parser which is "mysteriously" different from the actual grammar specified in the definition.

We can look into this somewhat if you want, but I doubt this can be fixed in a way that is truly satisfying. I just want you to moderate your expectations.

Mind you, if your only concern is being able to do this in a manner such that K faithfully respects modularity in the scanner, that's definitely something we ought to in theory be able to fix; it might be a bit tricky, but it's probably possible to create a single scanner used by all modules that nevertheless receives a module id as input when initialized and doesn't return tokens not in that module, ever. This would not fix your example, but it would remove the 8 errors in domains.md and definitely would fix some examples similar to the one you describe.

Let me know if that solution would suffice for you as it's probably going to be a lot easier to implement in a clean manner.

runtimeverification / k