maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.92k stars 123 forks source link

Incorrect regex match of `0[oO](_?[0-7])+` #264

Open lucifer1004 opened 2 years ago

lucifer1004 commented 2 years ago
#[derive(Logos, Debug, PartialEq)]
pub enum Token {
    #[regex("0[oO](_?[0-7])+")]
    IntegerOct,

    #[regex("[a-zA-Z_][a-zA-Z0-9_]*")]
    Identifier,

    #[error]
    #[regex(r"[ \t\n\f]+", logos::skip)]
    Error,
}

fn main() {
    let mut lex = Token::lexer("0o123_");
    while let Some(token) = lex.next() {
        println!("{:?} {}", token, lex.slice());
    }
}

yields

IntegerOct 0o123_

This is incorrect because the last _ should not be included.

Changing the regex from 0[oO](_?[0-7])+ to 0[oO](_?[0-7]+)+ solves this issue and gives the correct output:

IntegerOct 0o123
Identifier _

But 0o123_ is not a full match of the regex 0[oO](_?[0-7])+, so there must be some issue inside.

glenda64 commented 1 year ago

I'm having a similar issue with the regex [1-9][0-9_]*[0-9]. It seems any variation on the middle term with * or + no longer matches patterns like 103.

nekodjin commented 1 year ago

Additionally, the pattern 0|[1-9]([0-9_]*[0-9])? incorrectly matches strings such as 1_.

ccurricane commented 1 year ago

I'm having a similar issue with regex

#[derive(Logos, Debug, PartialEq)]
// #[logos(subpattern)]
enum Token {
    #[error]
    Error,

    #[regex(r"[ \t\n\f]+", logos::skip)]
    Whitespace,

    // #[regex("file://[-A-Za-z0-9+&@#/%?=~_|!:.,;]+")]
    #[regex("file://[-A-Za-z0-9+&@#/%?=~_|!:.;]+[-A-Za-z0-9+&@#/%=~_|]")]
    FileUri,

    #[regex("d[a-z,:]+[a-z]")]
    Dome,

    // #[regex("https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]")]
    // #[regex("(https|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]")]
    // HttpUri,
}

fn main() {
    let mut lex = Token::lexer("demo file://demo.csv");
    assert_eq!(lex.next(), Some(Token::Dome));
    // assert_eq!(lex.next(), Some(Token::Dome));
    assert_eq!(lex.next(), Some(Token::FileUri));
    assert_eq!(lex.slice(), "file://demo.csv");
}

cargo run and show

thread 'main' panicked at src/main.rs:29:5:
assertion failed: `(left == right)`
  left: `Some(Error)`,
 right: `Some(Dome)`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
quendimax commented 8 months ago

I have the same issue with version 0.14.0. I could simplify the regex to [ab]*a, and it doesn't match the text ba.