jzimmerman / langcc

langcc: A Next-Generation Compiler Compiler
https://langcc.io
Apache License 2.0
1.73k stars 58 forks source link

The example grammar from the langcc slides can't be parsed? #42

Closed modulovalue closed 1 year ago

modulovalue commented 1 year ago

Consider the following:

// https://mdaines.github.io/grammophone/?s=UyAtPiBhIFggYSB8IGIgWCBiIHwgYSBZIGIgfCBiIFkgYS4KWCAtPiBjICJYJyIuClkgLT4gYyAiWSciLgoiWCciIC0+IGMuCiJZJyIgLT4gYy4=
// S -> a X a | b X b | a Y b | b Y a.
// X -> c "X'".
// Y -> c "Y'".
// "X'" -> c.
// "Y'" -> c.

tokens {
    ka <= `a`;
    kb <= `b`;
    kc <= `c`;
}

lexer {
    main { body }

    mode body {
        ka => { emit; }
        kb => { emit; }
        kc => { emit; }
        eof => { pop; }
    }
}

parser {
    main { S }
    S <- `a` X `a` | `b` X `b` | `a` Y `b` | `b` Y `a`;
    X <- `c` XP;
    Y <- `c` YP;
    XP <- `c`;
    YP <- `c`;
}

The grammar is from these slides.

I expected langcc to be able to parse it, because it is LR(1), but it seems like it can't:

[000:00:00.013584] -- Performing initial validation and tabulation
[000:00:00.022069] -- Compiling lexer
[000:00:00.029373] -- Compiling parser: tabulating symbols
[000:00:00.029843] -- Compiling parser: inferring attributes
[000:00:00.032254] -- Compiling parser: symbol iteration 1 (0 triggers)
[000:00:00.037547] -- SLR(1) NFA: 49 vertices, 50 edges
[000:00:00.039303] -- SLR(1) subset NFA: 34 vertices, 35 edges
[000:00:00.039861] -- Compiling parser: final lookaheads
[000:00:00.041901] -- Compiling parser: constructing LR NFA
[000:00:00.043192] -- Compiling parser: LR NFA subset construction
[000:00:00.044896] -- Compiling parser: detected LR conflicts; searching for traces
[000:00:00.045405] -- Compiling parser: tabulating conflicts
[000:00:00.046818] langcc compile error:
[000:00:00.046818] 
[000:00:00.046818]  ===== LR conflict 1 of 1
[000:00:00.046818] 
[000:00:00.046818]      &S              &S    
[000:00:00.046818]            RecurStep(S)    
[000:00:00.046818]     `a`             `a`    
[000:00:00.046818]                            
[000:00:00.046818]                Recur(X)    Recur(Y)
[000:00:00.046818]                            
[000:00:00.046818]                     `c`    `c`
[000:00:00.046818]                     `c`    `c`
[000:00:00.046818]                     `a`    `b`
[000:00:00.046818]                            
[000:00:00.046818] 
[000:00:00.046818] 
[000:00:00.046818] 
[000:00:00.046818] 
[000:00:00.046818] 
modulovalue commented 1 year ago

Because langcc appears to be abandoned, I'm going to close this issue.