katef / libfsm

DFA regular expression library & friends
BSD 2-Clause "Simplified" License
931 stars 52 forks source link

Convert libfsm's internal representation to an adjacency list #83

Open katef opened 6 years ago

katef commented 6 years ago

Currently libfsm stores its FSMs as a graph of structs pointing to each other. This ticket proposes an adjacency list instead, where the whole FSM would be stores as an array of state-to-state pairs, labelled with edges. Possibly we'd also want a skip list down the side, to quickly find the start of each state's list of edges.

This representation has a few advantages; it's simpler to deal with in a few ways, but also allows for data to be attached to edge nodes, which currently only exist as an index into an array of states. An adjacency list also allows some operations to be parallelised (perhaps by threads, SIMD, etc).

One concern is that some fsm_*() operations are currently O(1), and I'd want to take especial care not to damage performance for those on larger FSM.

katef commented 6 years ago

This will make /fuzz/gengraph.rb easy to implement in C, and could be provided as part of the fsm_*() API; then fsm(1) would expose that. And the benchmarking tool ought to be able to read .fsm files anyway.