osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
63 stars 7 forks source link

Generate functions for semantic actions #19

Closed osa1 closed 2 years ago

osa1 commented 2 years ago
osa1 commented 2 years ago

This currently makes a benchmark 7% slower (runtime).

osa1 commented 2 years ago

In lexers with a lot of "simple" rules (i.e. <regex>,) this generates a lot of extra code (in macro expansion, didn't check the final assembly). Macro expanded code for Lua 5.1 lexer grows 10% (805 lines).

For example, rule for character '(' is originally this:

'(' => {
    self.current_match_end += char.len_utf8();
    let _ = self.iter.next();
    let rhs: Token<'input> = Token::LParen;
    self.state = self.initial_state;
    let match_start = self.current_match_start;
    self.current_match_start = self.current_match_end;
    return Some(Ok((match_start, rhs, self.current_match_end)));
}

With this PR it becomes:

'(' => {
    self.current_match_end += char.len_utf8();
    let _ = self.iter.next();
    let str =
        &self.input[self.current_match_start..self.current_match_end];
    let handle = LexerHandle {
        iter: &mut self.iter,
        match_: str,
        user_state: &mut self.user_state,
    };
    match SEMANTIC_ACTION_7(handle) {
        LexerAction::Continue => {
            self.state = self.initial_state;
        }
        LexerAction::Return(res) => {
            self.state = self.initial_state;
            let match_start = self.current_match_start;
            self.current_match_start = self.current_match_end;
            return Some(Ok((
                match_start,
                res,
                self.current_match_end,
            )));
        }
        LexerAction::Switch(rule_set) => {
            self.switch(rule_set);
        }
        LexerAction::SwitchAndReturn(res, rule_set) => {
            self.switch(rule_set);
            let match_start = self.current_match_start;
            self.current_match_start = self.current_match_end;
            return Some(Ok((
                match_start,
                res,
                self.current_match_end,
            )));
        }
    }
}

The problem is all semantic action functions need to have the most general type possible (the type for fallible actions for lexer with an error type, infallible actions for others), so handling LexerAction value gets more complicated for simple rules.

I think we could generate the old code for simple rules, just store the expression in semantic action table for simple rules.

In general we will still need to generate semantic action functions for simple rules, but in a lot of cases they won't be used.

osa1 commented 2 years ago

We could also maintain counters for how many times semantic action fns are used, and inline ones that are used once.

Or even better, generate functions for semantic actions used more than once, instead of generating for all and then eliminating some of them.

osa1 commented 2 years ago

Or even better, generate functions for semantic actions used more than once, instead of generating for all and then eliminating some of them.

I implemented this 223628f, but I'm not sure it's worth the complexity in the code generator. I think I will revert it and rely on rustc/LLVM on inlining single-use semantic action functions and eliminating top-level definitions.