Provide a way to match end-of-input in non-initial states

osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro

MIT License

63 stars 7 forks source link

lexer! { Lexer -> &'input str; rule Init { "//" => |lexer| { lexer.switch(LexerRule::SingleLineComment) }, } rule SingleLineComment { '\n' => |lexer| { let comment = lexer.match_(); lexer.switch_and_return(LexerRule::Init, comment) }, _, } }

#[cfg(test)] fn ignore_pos<A, E>(ret: Option<Result<(usize, A, usize), E>>) -> Option<Result<A, E>> { ret.map(|res| res.map(|(_, a, _)| a)) } #[test] fn comment() { let input = "// test"; let mut lexer = Lexer::new(input); assert_eq!(ignore_pos(lexer.next()), Some(Ok(input))); // fails assert_eq!(ignore_pos(lexer.next()), None); }

I think we will need a special symbol, maybe eof, to match end-of-input.

The question is whether to make it a regex, or a LHS.

If we make it a regex then we allow nonsensical regex like eof+ 'a' (match one or more "end of input", then character 'a') so I don't like this too much.

If we make it a LHS then it will be similar to _ in how we use it and handle it in the implementation. The example above will look like:

lexer! {
    Lexer -> &'input str;

    rule Init {
        "//" => |lexer| {
            lexer.switch(LexerRule::SingleLineComment)
        },
    }

    rule SingleLineComment {
        '\n' => |lexer| {
            let comment = lexer.match_();
            lexer.switch_and_return(LexerRule::Init, comment)
        },

        eof => |lexer| {
            let comment = lexer.match_();
            lexer.switch_and_return(LexerRule::Init, comment)
        }

        _,
    }
}

Since we cannot do '\n' | eof (because eof is not a regex) this has a little bit duplication, but I think it's not too bad.

Note that we don't need to match eof in the Init rule, as we have a special case in Init and handle eof to return None in the next method.

osa1 / lexgen

Provide a way to match end-of-input in non-initial states #13