osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
63 stars 7 forks source link

Consider merging char and range transitions in NFA and DFA #40

Open osa1 opened 2 years ago

osa1 commented 2 years ago

Currently we have these transitions in NFAs:

struct State<A> {
    char_transitions: Map<char, Set<StateIdx>>,
    range_transitions: RangeMap<Set<StateIdx>>,
    empty_transitions: Set<StateIdx>,
    any_transitions: Set<StateIdx>,
    end_of_input_transitions: Set<StateIdx>,
    ...
}

(I was confused for a few seconds here on why we have both empty_transitions and end_of_input_transitions, we should document that "empty" is epsilon)

and these in DFAs:

pub struct State<T, A> {
    char_transitions: Map<char, T>,
    range_transitions: RangeMap<T>,
    any_transition: Option<T>,
    end_of_input_transition: Option<T>,
    ...
}

In both of these, we could merge char and range transitions in a single RangeMap field. When the transition happens on a character the range will include a single character. This has a few advantages:

Disadvantage is that for char transitions we will need heap allocation for the Vec<Range>. Two ways to avoid that: