ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
361 stars 24 forks source link

Capture result as well as matched tokens #42

Open expipiplus1 opened 5 years ago

expipiplus1 commented 5 years ago

It would be nice to be able to capture the result of a production as well as the text matched by it, for example:

match :: Prod r e t a -> Prod r e t ([t], a)
expipiplus1 commented 5 years ago

Ignoring NonTerminals it could look like this:

match :: Prod r e t a -> Prod r e t ([t], a)
match = \case
  Pure a -> Pure ([], a)
  Terminal p c ->
    Terminal (\t -> (t, ) <$> p t) (biliftA2 (flip (:)) id <$> match c)
  -- NonTerminal :: !(r e t a) -> !(Prod r e t (a -> b)) -> Prod r e t b
  NonTerminal _ _ -> error "match: NonTerminal"
  Alts as c ->
    -- Alts (match <$> as) (((\(ts, f) (t, x) -> (t ++ ts, f x))) <$> match c)
    Alts (match <$> as) (biliftA2 (flip (++)) id <$> match c)
  Many a c -> Many
    (match a)
    ((\(t, f) (unzip -> (ts, xs)) -> (concat (ts ++ [t]), f xs)) <$> match c)
  Named p n -> Named (match p) n

Edit: corrected implementation

ollef commented 5 years ago

That sounds like a useful feature. A PR would be welcome.

An alternative approach might be to add a primitive

position :: Prod r e t Int

which always succeeds, giving you the current position in the input string.

You could then get the position before and after a production and use that to get the parsed slice of the input string.