ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
361 stars 24 forks source link

Thread monad parameter to Constraint productions #60

Closed expipiplus1 closed 1 year ago

expipiplus1 commented 1 year ago

Also add parser' to allow running parsers within a Reader-like monad

expipiplus1 commented 1 year ago

I can't claim this is neat or finished, but if it's not an immediate "wtf were you thinking" then I'd be happy to neaten it up etc..

expipiplus1 commented 1 year ago

The motivation here is that if you're running the same parser many times but with different constraints then this avoids having to rebuild the parser from the grammar every time.

ollef commented 1 year ago

Hello!

The implementation looks fine to me.

One thing I'm a bit worried about is complexity: this library is already quite complex with a bunch of type parameters etc. Is the benefit that this brings worth the additional complexity? At the very least, we should probably provide type synonyms for the old style of types and give new names to the new, more general ones. This way this change can be backward-compatible.

What's allowed as m is quite restricted. Is there anything other than reader that will work there? It feels like we may be able to achieve the same functionality with some kind of token mapping function, but I'd need to think more about it.

If you have a concrete example of a use case that might help the thinking. :)

expipiplus1 commented 1 year ago

I didn't want to have to shift all of parse into m, so yeah, it's restricted to whatever can be mapped to ST. This does mean that one could have ST operations in there too.

Tbh I only wanted some immutable (reader) context in there, but it was no more complicated to thread it through as m.

The use case is using Earley for parsing just the ambiguous parts of a larger grammar, so using something like Megaparsec for the large scale stuff, with nice backtracking, errors and recovery etc.. but then parsing to just tokens and deferring to Earley, and being able to pass in some larger state from the higher level parser without rebuilding the grammar.

No problem adding type synonyms as part of any cleanup of course :)

ollef commented 1 year ago

I think I get the point of this, but It'd still be nice to see a boiled-down example of using this feature, maybe as part of the PR.

Do you have any examples of what you might use ST for as the m? It seems like it might be possible to make that violate some internal assumptions that the caching mechanism in this library makes, but I'm not entirely sure. Without ST, I guess we're left with m = Identity or Reader...? Then it seems more straightforward to make it an env parameter that's passed as an argument instead.

I'm guessing the Terminal action could also have an m/env.

expipiplus1 commented 1 year ago

I think conceptually this doesn't work, because finding the null productions/rules (as part of generating the grammar) requires this information anyway, so you don't win by being able to pass it in earlier.

Adding this information to tokens is I think the next best approach.

This isn't to say that it's totally useless, for example one could still support actions typed something like constraint :: a -> (Bool, m ()) to be used, for example, recording a diagnostic whenever a constraint filtered something. But this is also served by disambiguate being able to annotate a node with this information also also.