Open nikita-volkov opened 8 years ago
This is basically by design. You could emit some errors though - with a function mapWithFailure
. So you could theoretically do something like (didn't test):
arrayOf (element <|> mapWithFailure (const (Left "error")) (pure ()))
Which should fail if a document without "id" and "document" keys is found.
Okay. Then what to do when the input is not array at all?
I guess something like this:
let reportError = mapWithFailure (const (Left "error")) (pure ()))
(arrayOf (element <|> reportError)) <|> reportError
Are you sure? As I understand the problem is that arrayOf
never fails and hence the following branch will never be executed:
(arrayOf (element <|> reportError)) <|> reportError
-- ^ this will never be executed
It will be executed, but it will be executed if no item is found as well; which will happen with empty array. You can force generation of some items with arrayFound
though; it would be somewhat cumbersome.
Do you think I should add some more functions to allow such behaviour? Any idea about the signatures?
expectArray :: String -> Parser a -> Parser a
expectObject :: String -> Parser a -> Parser a
This would allow something like:
ensureArray "List of items not found" . arrayOf value
Would this work for you?
IMO arrayOf
and similar functions should be updated instead to include that check. After all, by typing arrayOf
the user explicitly specifies, which data structure he expects. If he wants an empty list, when it's not an array he can simply do <|> pure []
instead. IMO that would be a way more natural approach.
Currently that wouldn't work. The <|>
operator works like 'use right side if left side didn't produce anything', not 'if it produced an error'. Actually, currently the <|>
is implemented in a way that doesn't compose well with errors (i.e. any error, both on left and right may cause the parser to fail even when the other branch would produce correct result).
I think the <|>
can be fixed so it would compose correctly with errors. Given that we have the mapWithFailure
, it should behave better anyway. Changing the arrayOf
and friends - I'm still not quite sure. It seems to me the idea that if it doesn't match it is ignored isn't bad, but in practice the reverse might be actually better.
It seems to me the idea that if it doesn't match it is ignored isn't bad, but in practice the reverse might be actually better.
Please keep in mind that what I'm suggesting here is to be consistent with the way other parsers like "parsec" and "attoparsec" work. That implies being intuitive to the users of those packages, which is the majority.
There are other things that work differently in other packages. E.g. when you write:
arrayOf ("name" .: value)
What should the semantics be? Ok, let's say -
array
object
If the user wanted the 'otherwise ignore' behaviour, <|> mempty
would work.
Ok, I'll have a look at it, will see how it turns out.
The rule is very simple actually: fail if any node is not as specified.
In this specific example it means that it should fail if any of the following applies:
value
parser fails1c419173396fb66b245b61039e4a85aec69d3474 should change the behaviour. It should generally behave the way you want it - can you test it?
I'm not quite happy with the new behaviour. The problem is that the <|>
operator is used both for "if the left side failed" and for "if the left side produced no results". It worked quite intuitively with the 'ignore-on-nonmatch' behaviour, but with the error handling it seems to me somewhat wrong:
> parseLazyByteString ((arrayOf bool) <|> pure True) "[1,true,false,1]"
[True]
> parseLazyByteString ((arrayOf bool) <|> pure True) "[true,false,1]"
[True,False*** Exception: Failed parsing: 1
I'm not sure how to handle this correctly - first what is a reasonable behaviour (I might have some idea), but it seems to me too complex and not easily implementable.
I'm thinking about changing the behaviour, so that there would be 2 operators
<|>
operator?)I will have to think this through a little more but it seems to me that could result in quite consistent behaviour?
The general problem here seems to be stemming from the fact that all parsers produce a multiplicity of results. Even the clearly singleton parsers like bool
. This seems to be the main culprit of the ambiguity in design decisions that we experience and lets nonsense parser combination pass the typechecker.
I think it'll be very much worth researching introducing parsers with different contexts, like what I have in my experimental package. There's a Value parser, which parses a singleton literal and there are Object and Array parsers, which parse the multiplicities. Array and Object can be lifted into Literal and vice-versa. Thus you can clearly state what is possible where and have their individual typeclass instances make clear sense.
Could you describe what nonsense parser combination one could create? I don't quite understand the advantages of such approach - and I don't quite understand what would be the benefit regarding the described problem. It's and incremental parser - so ultimately the whole thing is supposed to generate multiple results.
I have a simple parser specification like this:
Whenever I feed it with any unexpected JSON, it doesn't fail. Instead it just results in an empty list. But I need to know when the input is incorrect.
Is this by design? Is there a way to work around this?