sebfisch / haskell-regexp

Regular Expression Matching in Haskell
http://sebfisch.github.com/haskell-regexp/
Other
34 stars 5 forks source link

arbitrary monoids in Status type #16

Closed sebfisch closed 14 years ago

sebfisch commented 14 years ago

In the first version of the matcher Status was defined as

data Status = Inactive | Active Bool

and Bool signified that a regular expression is a final state of the underlying DFA. In the next version Status is defined as

data Status = Inactive | Active [Int]

and [Int] is a list of start indices of words that are accepted in the current state of the underlying DFA. Bool and [Int] were/are merged using the associative operations (||) and (++) respectively. The corresponding zeros (that is, False and []) were/are used when combining regular expressions in sequence to drop the status of the first component if the second does not accept the empty word.

This calls for generalising the Status type to

data Status m = Inactive | Active m

for arbitrary Monoids m. The type Monoid.Any can be used to solve the word problem and the type Data.Set Int can be used to track indices (which replaces the manual implementation of mergeIndices by Set.union). Apart from re-implementing the old, different monoids can be used to implement new behaviour. For example

newtype Min = Min { getMin :: Int }

instance Monoid Min
 where
  mempty = Min 0
  a `mappend` b = min (getMin a) (getMin b)

can be used to implement leftmost (longest or shortest) match.

The type RegExp a needs to be changed to RegExp m a too.

sebfisch commented 14 years ago

done (16981745084f9df04cf506c5340f29844181bc52)

The Min monoid cannot use mempty = Min 0 which is no zero for the min function. I have added a zero using the Maybe type instead.