Open osa1 opened 3 years ago
Another example:
let whitespace =
['\t' '\n' '\u{B}' '\u{C}' '\r' ' ' '\u{85}' '\u{200E}' '\u{200F}' '\u{2028}' '\u{2029}'];
rule DecInt {
($dec_digit | '_')* $int_suffix?,
$ => |lexer| {
let match_ = lexer.match_();
lexer.return_(Token::Lit(Lit::Int(match_)))
},
$whitespace => |lexer| {
let match_ = lexer.match_();
// TODO: Rust whitespace characters 1, 2, or 3 bytes long
lexer.return_(Token::Lit(Lit::Int(&match_[..match_.len() - match_.chars().last().unwrap().len_utf8()])))
},
}
In the last rule we want to exclude the trailing whitespace. We can't just drop the last byte as the allowed whitespace characters can be 1, 2, or 3 bytes long. If we could bind the whitespace character we could do match_[..match.len() - whitespace_clar.len_utf8()]
.
Currently getting the matched character in a wildcard is quite verbose (and probably also inefficient):
One easy fix would be to add a
char
method tolexer
that returns the last matched character.Alternatively with #9 we could allow
<char:_> => ...
syntax.