maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.81k stars 112 forks source link

Stackoverflow when deriving token with block comment regex #400

Open KarelPeeters opened 1 month ago

KarelPeeters commented 1 month ago

The following derive setup causes the build to fail:

#[derive(Logos)]
enum TokenType {
    #[regex(r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/")]
    BlockComment,
}

The error printed is:

error: rustc interrupted by SIGSEGV, printing backtrace

/home/karel/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-1ccb730c51a3970e.so(+0x2ea5963)[0x7f95caea5963]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f95c7c42520]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x9b7cf)[0x7f95b7c9b7cf]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x959d7)[0x7f95b7c959d7]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x97e78)[0x7f95b7c97e78]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x95b78)[0x7f95b7c95b78]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcad1a)[0x7f95b7ccad1a]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x7e906)[0x7f95b7c7e906]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcfa89)[0x7f95b7ccfa89]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcfc95)[0x7f95b7ccfc95]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x62a64)[0x7f95b7c62a64]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd01c8)[0x7f95b7cd01c8]

### cycle encountered after 12 frames with period 14
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
### recursed 17 times

/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]

note: rustc unexpectedly overflowed its stack! this is a bug
note: maximum backtrace depth reached, frames may have been lost
note: we would appreciate a report at https://github.com/rust-lang/rust
help: you can increase rustc's stack size by setting RUST_MIN_STACK=16777216
note: backtrace dumped due to SIGSEGV! resuming signal
error: could not compile `demo` (bin "demo")

Other tokens (normal literal string, other regular expressions) work fine. I assume this is because somewhere the the derive machinery this specific regex causes infinite recursion.

Note: I got this regex from the LALRPOP book here.

KarelPeeters commented 1 month ago

After a bit of debugging:

I'll continue to investigate, but any advice would be appreciated!

jeertmans commented 1 month ago

Hello @KarelPeeters! Thanks for reporting this bug (though I am not sure if this is a bug or a limitation of Logos).

Unfortunately, I don't have time to investigate this at the moment. However, your regex seems very complex, and it might be worth trying to simplify it, at least by breaking it down into multiple tokens or using callbacks (this is usually the simplest thing to do when trying to match block comments).

facefaceless commented 1 month ago

For now, I think handling multiple line comment manually would be better. Here is code snippet from my project.

...
#[token("/*", multiline_comment)]
BlockComment,
...
fn multiline_comment(lex: &mut Lexer<TokenType>) -> FilterResult<(), LogosLexError> {
    enum State {
        ExpectStar,
        ExpectSlash,
    }
    let remainder = lex.remainder();
    let (mut state, mut iter) = (State::ExpectStar, remainder.chars());
    while let Some(next_char) = iter.next() {
        match next_char {
            '\n' => {
                lex.extras.line += 1;
                lex.extras.line_beg = lex.span().end + (remainder.len() - iter.as_str().len());
                state = State::ExpectStar;
            }
            '*' => state = State::ExpectSlash,
            '/' if matches!(state, State::ExpectSlash) => {
                lex.bump(remainder.len() - iter.as_str().len());
                return FilterResult::Skip;
            }
            _ => state = State::ExpectStar,
        }
    }
    lex.bump(remainder.len());
    FilterResult::Error(LogosLexError::IncompleteComment)
}