ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
360 stars 24 forks source link

feature: constraint validation of production parse result #50

Open user318 opened 4 years ago

user318 commented 4 years ago

Hello.

I did not find how to implement some sort of constraints on production. For example if I wrote a rule to parse a series of digits and wanted to validate that if falls into a specific range, what would I need to do? Of course I can check the values after the parser ends, but the context is lost in such case like position where error occurs. I see no functions in current interface I can use to do that and also I have not find something similar in the examples. It probably could be something function like:

validate :: (a -> Bool) -> Prod r e t a -> Prod r e t a

or

validate :: (a -> Maybe b) -> Prod r e t a -> Prod r e t b

so when the predicate fails or returns Nothing, the parser fails for that production. And we can use it like:

num <- rule $ validate (<= 1234) $ read <$> some digit

Or may be some sort of extended rule function with validation:

ruleValidate :: (a -> Bool) -> Prod r e t a -> Grammar r (Prod r e t a)
...
num <- ruleValidate (<= 1234) $ read <$> some digit
ollef commented 4 years ago

Hey!

Yeah, I think you're right that this isn't possible with the current interface. Would you like to contribute this feature?

user318 commented 4 years ago

Heh... I might, but I am a beginner in Haskell and your code looks like Chinese to me now. :) And it would take a time to me. If you gave some hints what parts need to be modified, it might be easier to dive in.

ocharles commented 3 years ago

Is that what you need to add "keyword parsing" to a parser? For example, I want to write a little expression parser, but my AST has let..in, along with alphanumeric variable names. I don't want Var "let" to be a valid parse, but I can't see anyway to express that in Earley at the moment.

Edit: I guess the alternative is an explicit tokenisation phase.

user318 commented 3 years ago

No, I need it for another purpose, but in your case it can be useful too if you have a list of keywords. Examples of what I want to use it for: limiting the range of an integer if I want to read a part of IP-address or a port number. Or to verify that the "IP/mask" has the correct IP with host bits set to zero. Or the range - is correct (number1 <= number2).

ocharles commented 3 years ago

Oh sorry, I was being a bit selfish - I meant to say would I need this requested feature for my purpose! I am trying to use Earley on a stream of Chars and don't want to parse let as Var "let". If I were in a monadic parser setting, I could use guard, but that's not an option as we're limited to Applicative. Your suggestion is like guard :: Bool -> m a, but would be more like filterProd :: (a -> Bool) -> Prod r e t a -> Prod r e t a, and I would be able to define identifier <- rule $ filterProd (notElem[ "let", "in" ]) (many (satisfy isAlphaNum)).

Edit: I thought this was done in https://github.com/ollef/Earley/pull/47, but that's something different. I should stop posting on GitHub near midnight...

ocharles commented 3 years ago

Just an update on this: I moved to an explicit tokeniser instead and no longer need this.

expipiplus1 commented 1 year ago

I wonder if this could be generalized to Filter :: ([a] -> [b]) -> Prod r e t a -> Prod r e t b, where this disambiguation function could filter the list. This would also enable generating a parse tree, rather than a list of all flat parses, i.e. pushing disambiguation towards the leaves. For example disambiguate d = Filter (pure . d)