Open osa1 opened 2 years ago
LR1 items: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L10-L17
LR1 sets: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L80-L84
This representation generates a lot of redundant (duplicate) info in states:
1571: { [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "("] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "bool"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-."] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+."] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "*."] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "/."] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "="] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<>"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<="] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">="] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ","] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ";"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "id"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "int"] [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "float"] }
All of the items in this state have the same production. We can combine all of these items into a single one with a lookahead set:
struct LR1Item { non_terminal_idx: NonTerminalIdx, production_idx: ProductionIdx, cursor: usize, lookahead: FxHashSet<TerminalIdx>, }
We can either add one more field for whether EOF is a valid lookahead, or we can assign EOF a TerminalIdx.
TerminalIdx
See #1 for optimizing terminal sets. (FxHashSet<TerminalIdx> in the code above)
FxHashSet<TerminalIdx>
LR1 items: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L10-L17
LR1 sets: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L80-L84
This representation generates a lot of redundant (duplicate) info in states:
All of the items in this state have the same production. We can combine all of these items into a single one with a lookahead set:
We can either add one more field for whether EOF is a valid lookahead, or we can assign EOF a
TerminalIdx
.See #1 for optimizing terminal sets. (
FxHashSet<TerminalIdx>
in the code above)