osa1 / parsegen

An LR parser generator, implemented as a proc macro
MIT License
15 stars 0 forks source link

Allow unused tokens #3

Open osa1 opened 2 years ago

osa1 commented 2 years ago

If I have this token type:

pub enum Token {
    LParen,
    RParen,
    Var,
}

and don't use the Var token in my parser:

parser! {
    enum Token {
        "(" => Token::LParen,
        ")" => Token::RParen,
    }

    pub Entry: usize = {
        <test:Test> => test,
    };

    Test: usize = {
        "(" <t:Test> ")" => t + 1,
        => 0,
    };
}

I get a compile error in generated code, with a terrible "help":

error[E0004]: non-exhaustive patterns: `&Var` not covered
  --> src/main.rs:21:10
   |
21 |     enum Token {
   |          ^^^^^ pattern `&Var` not covered
   |
note: `Token` defined here
  --> src/main.rs:8:5
   |
5  | pub enum Token {
   |          -----
...
8  |     Var,
   |     ^^^ not covered
   = note: the matched value is of type `&Token`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
   |
23 ~         ")" => Token,::LParen,
24 ~         &Var => todo!()::RParen,
   |

The problem is in the token_value function that we generate to extract values (fields) of tokens and convert them to SemanticActionRersult type: (which should probably be called "value", as it defines the values for the value stack)

fn token_value(token: &Token) -> SemanticActionResult {
    match token {
        Token::LParen => SemanticActionResult::Token0(),
        Token::RParen => SemanticActionResult::Token1(),
    }
}

Here unused tokens are not matched, causing the non-exhaustive pattern match error.

I think we should be able to just add a _ => unreachable!() at the end here. We should also probably add #[inline(always)] as in the call sites this function will be called after the next token matches a pattern with the token, so the constructor is always known, and the compiler should be able to easily eliminate this match expression using the known shape (constructor) of the token.

osa1 commented 2 years ago

We also generate a function that maps tokens to terminal indices:

    fn token_terminal_idx(token: &Token) -> usize {
        match token {
            Token::LParen => 0usize,
            Token::RParen => 1usize,
            Token::Backslash => 2usize,
            Token::Dot => 3usize,
            Token::Colon => 4usize,
            Token::Arrow => 5usize,
            Token::Top => 6usize,
            Token::Bot => 7usize,
            Token::Id(_) => 8usize,
        }
    }

These indices are then used to index the action table, and the table that maps tokens to their strings used in error messages.