osa1 / parsegen

An LR parser generator, implemented as a proc macro
MIT License
15 stars 0 forks source link

Combine LR1 items with same productions #7

Open osa1 opened 2 years ago

osa1 commented 2 years ago

LR1 items: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L10-L17

LR1 sets: https://github.com/osa1/parsegen/blob/f368fac8b352f01358dd7ad40152da14983019f9/src/lr1.rs#L80-L84

This representation generates a lot of redundant (duplicate) info in states:

1571: {
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "("]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "bool"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "*."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "/."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<>"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ","]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ";"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "id"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "int"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "float"]
}

All of the items in this state have the same production. We can combine all of these items into a single one with a lookahead set:

struct LR1Item {
    non_terminal_idx: NonTerminalIdx,
    production_idx: ProductionIdx,
    cursor: usize,
    lookahead: FxHashSet<TerminalIdx>,
}

We can either add one more field for whether EOF is a valid lookahead, or we can assign EOF a TerminalIdx.

See #1 for optimizing terminal sets. (FxHashSet<TerminalIdx> in the code above)