purescript-contrib / purescript-parsing

A parser combinator library based on Parsec
https://pursuit.purescript.org/packages/purescript-parsing
BSD 2-Clause "Simplified" License
150 stars 52 forks source link

`skipSpaces` always commits to branch even if no spaces are consumed #200

Closed JordanMartinez closed 2 years ago

JordanMartinez commented 2 years ago

Describe the bug

skipSpaces always commits to its path, even if it doesn't consume any spaces. This occurs because it uses the consumeWith function.

To Reproduce

See https://try.purescript.org/?gist=a37247868f0f7aaf3244f8c07fc304f6

In the line 4, no spaces are consumed. So, the (singleLineComment <|> newline) parse shouldn't occur and we should instead move to the next part of the parser string "@" *> string "value".

Expected behavior

skipSpaces should only commit to its path if it actually consumes content.

Additional context Add any other context about the problem here.

natefaubion commented 2 years ago

I'm not sure I agree with this. There are cases where it makes sense to commit to a branch even when the input stream slice is unchanged, such as with eof or string "". Is this an unintended regression, or a question of semantics?

natefaubion commented 2 years ago

However, if many (string " ") behaves differently, then I think one could argue that skipSpaces is just an optimization of that parser, and should not count as consumed.

JordanMartinez commented 2 years ago

Probably one of semantics. I'm not sure if the current behavior is different than the previous one.

jamesdbrock commented 2 years ago

Yeah, both of these parsers succeed for both of these inputs, but have different consumed states afterwards if the input has no spaces.

Input skipSpaces many (string " ")
" " consumed consumed
"x" consumed not consumed

I think the prior version of skipSpaces had the same semantics as many (string " ").

https://github.com/purescript-contrib/purescript-parsing/blob/24d6693a08ed498a533875495fe2e076b2e1f872/src/Text/Parsing/Parser/String.purs#L149-L151

JordanMartinez commented 2 years ago

So, which should be the correction version? If the current one (where "x" results in a consumed state), then I think its docs should be updated to state that.

natefaubion commented 2 years ago

I think that it should have the semantics similar to many (string " "). The issue is that consumeWith's type is too restrictive. Currently the callback returns an Either, but really what you would want is something like:

data Commit a
  = Fail String
  | Skip a
  | Commit a String String

Which is Either String (Tuple a (Maybe { consumed :: String, remainder String })). This reflects the possibility that you may want to continue with the current parser but not consume anything.

If we don't want to change the type, which I think makes sense for now, then I would make consumeWith only set the consume flag when the consumed string is non empty as suggested. eof does not use consumeWith, so I think that's ok.