sebfisch / haskell-regexp

Regular Expression Matching in Haskell
http://sebfisch.github.com/haskell-regexp/
Other
34 stars 5 forks source link

balanced sequences #10

Closed sebfisch closed 14 years ago

sebfisch commented 14 years ago

Use a balanced sequence data type such as finger trees to represent sequences of regular expressions. As the Status type can be made a Monoid, finger trees may be good fit for our algorithm.

sebfisch commented 14 years ago

Using explicit associative operations may lead to new insights.

Maybe it is even helpful to make associativity explicit in the datatype such that concatenation and alternative are both associative. Like

data RegExp a = ... | Seq [RegExp a] | Alt [RegExp a]

Porting the implementation to such a representation may clarify where associativity is important and simplify a later change going from lists to finger trees.

As an aside, with the above representation Seq [] matches the empty word and Alt [] represents the empty language.

sebfisch commented 14 years ago

Instead of using a predefined balanced sequence type, we could use problem specific information to balance in a specific way. In order to skip inactive parts, we want that active leafs are not nested deep in otherwise inactive expressions. We want them to be at a high level. As active leafs travel from left to right it is good to start with a right associative tree. As active leafs jump from the right to the left end of repeated sequences and new activation starts at the left end, we probably want the left leaf to stay at a high level.

We regenerate the whole expression in each step and can thus change how associative operations are nested during the algorithm. Can we achieve a good custom balancing by rotating around leafs when activating them?

sebfisch commented 14 years ago

it's probably better to store the weights next to the regexp rather than inside it, to reduce garbage.