maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.71k stars 105 forks source link

Unreachable branch in LUTs are still linked to #385

Closed RustyYato closed 1 month ago

RustyYato commented 2 months ago

I have some code like this:

#[derive(Logos)]
#[logos(source = [u8])]
enum Token {
    // NOTE: This is needed because logos has dot_matches_newline(false) set for regex_syntax (which is the default)
    #[token("\n")]
    Newline,
    #[regex(b".", priority = 0)]
    UnknownByte,
}

And this lexer should be impossible to error from so I use the error type enum LexerError {} which will cause a linker error in release mode like so

impl Default for LexerError {
    #[cfg(not(debug_assertions))]
    fn default() -> Self {
        extern "C" {
            fn __lexer_error_unreachable_default() -> !;
        }

        // force a linker error
        unsafe { __lexer_error_unreachable_default() }
    }

    #[cfg(debug_assertions)]
    fn default() -> Self {
        panic!("It is impossible for the lexer to error")
    }
}

This would work if the LUT didn't generate the error branch. And for some reason LLVM is unable to optimize out this branch. I suspect it's because the LUT is stored in a static, which tends to be an optimization barrier.

To fix this, the error branch simply shouldn't be generated if it is unreachable.