maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.91k stars 118 forks source link

Attempting to lex kotlin style interpolated strings #423

Open CreggEgg opened 1 month ago

CreggEgg commented 1 month ago

I'm looking to lex a string like "${..}" where the ..'s represent other valid language tokens inside and am wondering what the best way to approach this is. I've tried to make a start and end token for this type of string but the end one is prioritized over this token: } on its own. My current attempt to circumvent this is

#[token("}")]
RBrace,
#[regex(r#""(?:[^(\$\{)]|\\")*(\$\{)"#)]
StartInterpolatedString(&'a str),
#[regex(r#"\}(?:[^"(\$\{)]|\\")*""#, interpolated_string_callback)]
EndInterpolatedString(&'a str),

and then a callback that looks like this

fn interpolated_string_callback<'a>(lex: &mut Lexer<'a, Token<'a>>) -> Option<Skip> {
    if lex.extras.string_state == StringState::Started {
        None
    } else {
        Some(Skip)
    }
}

But this is closer to pseudo code then anything actually useful and won't even compile and it seems to me like this is not the right approach for this issue. I'm new to logos and this seems atypically cumbersome based on my previous logos experiences so its very possible I'm just doing something really silly but I would really appreciate any help at all!

jeertmans commented 1 month ago

Hello, I think the best is to have a token InterpolationString(...) matching the start of an interpolation string, and have a callback handle everything:

fn callback(lex: ...) -> Vec<Token> {
    /* accumulate tokens until you match a right brace */
    /* but make sure to handle specific cases like escaped right brace, nested interpolation string, etc. */
}

#[derive(Logos)
enum Token {
    #[token("${", callback)]
    InterpolationString(Vec<Token>),
    #[token("}")]
    RightBrace,
}
CreggEgg commented 1 month ago

Thanks for the response! That makes sense to me. One question though, how would I get content at the start of the string for example: "User's name: ${name}" using this approach?

jeertmans commented 1 month ago

Apply the same logic, but for string, you match the start of a string and use a callback to parse its content.