osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
63 stars 7 forks source link

Allow let bindings interleaved with rules #28

Closed osa1 closed 1 year ago

osa1 commented 2 years ago

Currently let bindings need to come before rules. This is inconvenient in large lexers. For example, I'm working on a Rust lexer, and I currently have these let bindings:

    // https://doc.rust-lang.org/reference/whitespace.html
    let whitespace =
        ['\t' '\n' '\u{B}' '\u{C}' '\r' ' ' '\u{85}' '\u{200E}' '\u{200F}' '\u{2028}' '\u{2029}'];

    let oct_digit = ['0'-'7'];
    let dec_digit = ['0'-'9'];
    let hex_digit = ['0'-'9' 'a'-'f' 'A'-'F'];
    let bin_digit = '0' | '1';
    let int_suffix = ('u' | 'i') '8' | ('u' | 'i') "16" | ('u' | 'i') "32" |
            ('u' | 'i') "64" | ('u' | 'i') "128" | ('u' | 'i') "size";

There will be more in the final version. Ideally I shouldn't have to declare regexes specific to parsing numbers before everything else. I should be able to declare common regexes for numbers right before the rules for lexing numbers.

osa1 commented 2 years ago

It turns out this is already supported. We should update README to reflect that. Currently it reads like let bindings should come before rules.

However, we don't allow let bindings in rule { } blocks, and also in lexer definitions without named rules. Maybe we could allow this.