osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
63 stars 7 forks source link

Avoid redunant backtrack state updates #43

Open osa1 opened 2 years ago

osa1 commented 2 years ago

In the Lua lexer I see code like

'>' => {
    self.0.set_accepting_state(Lexer_ACTION_13);          // 2
    match self.0.next() {
        None => {
            self.0.__done = true;
            match self.0.backtrack() {                    // 6
                ...
            }
        }
        Some(char) => match char {
            '>' => {
                self.0.reset_accepting_state();           // 12
                match Lexer_ACTION_31(self) {
                    ...
                }
            }
            '=' => {
                self.0.reset_accepting_state();           // 18
                match Lexer_ACTION_11(self) {
                    ...
                }
            }
            _ => match self.0.backtrack() {               // 23
                ...
            },
        },
    }
}

Here we don't need to set accepting state in line 2. Lines 6 and 23 call the semantic action set in line 2, and could call the function directly instead of going through backtrack. Lines 12 and 18 ignore accepting state entirely and call other semantic action functions.

I think if we move set_accepting_state calls to the branches it may become easier in the code generator to handle these cases. In that case the code above would look like:

'>' => {
    match self.0.next() {
        None => {
            self.0.set_accepting_state(Lexer_ACTION_13);       
            self.0.__done = true;
            match self.0.backtrack() {                
                ...
            }
        }
        Some(char) => match char {
            '>' => {
                self.0.set_accepting_state(Lexer_ACTION_13);       
                self.0.reset_accepting_state();      
                match Lexer_ACTION_31(self) {
                    ...
                }
            }
            '=' => {
                self.0.set_accepting_state(Lexer_ACTION_13);       
                self.0.reset_accepting_state();     
                match Lexer_ACTION_11(self) {
                    ...
                }
            }
            _ => {
                self.0.set_accepting_state(Lexer_ACTION_13);       
                match self.0.backtrack() {        
                    ...
                }
            }
        },
    }
}

Now in generate_state_arm, we can avoid generating set_accepting_state if we backtrack or call another semantic action function. We also need to use peek instead of next.

osa1 commented 2 years ago

With new_from_iter it's more important to avoid clones in set_accepting_state as not all iterators will be cheap to clone. For example, ropey's char iterator is quite a bit larger than std's Chars: https://github.com/cessen/ropey/blob/36b21712ce9cdc125fa4580e573966a8bd3fd08c/src/iter.rs#L288-L296