Doesn't fail on parsing failure

nikita-volkov commented 8 years ago

I have a simple parser specification like this:

postDocuments :: Parser (Text, Aeson.Value)
postDocuments =
  arrayOf element
  where
    element =
      (,) <$>
      objectWithKey "id" string <*>
      objectWithKey "document" value

Whenever I feed it with any unexpected JSON, it doesn't fail. Instead it just results in an empty list. But I need to know when the input is incorrect.

Is this by design? Is there a way to work around this?

ondrap commented 8 years ago

This is basically by design. You could emit some errors though - with a function mapWithFailure. So you could theoretically do something like (didn't test):

arrayOf (element <|> mapWithFailure (const (Left "error")) (pure ()))

Which should fail if a document without "id" and "document" keys is found.

nikita-volkov commented 8 years ago

Okay. Then what to do when the input is not array at all?

ondrap commented 8 years ago

I guess something like this:

let reportError = mapWithFailure (const (Left "error")) (pure ()))
(arrayOf (element <|> reportError)) <|> reportError

nikita-volkov commented 8 years ago

Are you sure? As I understand the problem is that arrayOf never fails and hence the following branch will never be executed:

(arrayOf (element <|> reportError)) <|> reportError
                                     -- ^ this will never be executed

ondrap commented 8 years ago

It will be executed, but it will be executed if no item is found as well; which will happen with empty array. You can force generation of some items with arrayFound though; it would be somewhat cumbersome.

Do you think I should add some more functions to allow such behaviour? Any idea about the signatures?

expectArray :: String -> Parser a -> Parser a
expectObject :: String -> Parser a -> Parser a

This would allow something like:

ensureArray "List of items not found" . arrayOf value

Would this work for you?

nikita-volkov commented 8 years ago

IMO arrayOf and similar functions should be updated instead to include that check. After all, by typing arrayOf the user explicitly specifies, which data structure he expects. If he wants an empty list, when it's not an array he can simply do <|> pure [] instead. IMO that would be a way more natural approach.

ondrap commented 7 years ago

Currently that wouldn't work. The <|> operator works like 'use right side if left side didn't produce anything', not 'if it produced an error'. Actually, currently the <|> is implemented in a way that doesn't compose well with errors (i.e. any error, both on left and right may cause the parser to fail even when the other branch would produce correct result).

I think the <|> can be fixed so it would compose correctly with errors. Given that we have the mapWithFailure, it should behave better anyway. Changing the arrayOf and friends - I'm still not quite sure. It seems to me the idea that if it doesn't match it is ignored isn't bad, but in practice the reverse might be actually better.

nikita-volkov commented 7 years ago

It seems to me the idea that if it doesn't match it is ignored isn't bad, but in practice the reverse might be actually better.

Please keep in mind that what I'm suggesting here is to be consistent with the way other parsers like "parsec" and "attoparsec" work. That implies being intuitive to the users of those packages, which is the majority.

ondrap commented 7 years ago

There are other things that work differently in other packages. E.g. when you write:

arrayOf ("name" .: value)

What should the semantics be? Ok, let's say -

fail if the top-level object is not array
fail if the content of the array is not an object
??? fail if the object doesn't contain attribute "name"

If the user wanted the 'otherwise ignore' behaviour, <|> mempty would work.

Ok, I'll have a look at it, will see how it turns out.

nikita-volkov commented 7 years ago

The rule is very simple actually: fail if any node is not as specified.

In this specific example it means that it should fail if any of the following applies:

the top-level node is not Array
it's element is not Object
that object does not contain a key "name"
the value parser fails

ondrap commented 7 years ago

1c419173396fb66b245b61039e4a85aec69d3474 should change the behaviour. It should generally behave the way you want it - can you test it?

ondrap commented 7 years ago

I'm not quite happy with the new behaviour. The problem is that the <|> operator is used both for "if the left side failed" and for "if the left side produced no results". It worked quite intuitively with the 'ignore-on-nonmatch' behaviour, but with the error handling it seems to me somewhat wrong:

> parseLazyByteString ((arrayOf bool) <|> pure True) "[1,true,false,1]"
[True]

> parseLazyByteString ((arrayOf bool) <|> pure True) "[true,false,1]"
[True,False*** Exception: Failed parsing: 1

I'm not sure how to handle this correctly - first what is a reasonable behaviour (I might have some idea), but it seems to me too complex and not easily implementable.

ondrap commented 7 years ago

I'm thinking about changing the behaviour, so that there would be 2 operators

execute right branch if the left produced no results
execute right branch if the left failed (this would be the <|> operator?)

I will have to think this through a little more but it seems to me that could result in quite consistent behaviour?

nikita-volkov commented 7 years ago

The general problem here seems to be stemming from the fact that all parsers produce a multiplicity of results. Even the clearly singleton parsers like bool. This seems to be the main culprit of the ambiguity in design decisions that we experience and lets nonsense parser combination pass the typechecker.

I think it'll be very much worth researching introducing parsers with different contexts, like what I have in my experimental package. There's a Value parser, which parses a singleton literal and there are Object and Array parsers, which parse the multiplicities. Array and Object can be lifted into Literal and vice-versa. Thus you can clearly state what is possible where and have their individual typeclass instances make clear sense.

ondrap commented 7 years ago

Could you describe what nonsense parser combination one could create? I don't quite understand the advantages of such approach - and I don't quite understand what would be the benefit regarding the described problem. It's and incremental parser - so ultimately the whole thing is supposed to generate multiple results.

ondrap / json-stream

Doesn't fail on parsing failure #12