j-mie6 / gigaparsec

Refreshed parsec-style library for compatibility with Scala parsley
https://j-mie6.github.io/gigaparsec/
BSD 3-Clause "New" or "Revised" License
14 stars 3 forks source link

[BUG] `hide` combinator revives dead hints #30

Closed HEIGE-PCloud closed 7 months ago

HEIGE-PCloud commented 7 months ago

To reproduce, run cabal repl, then

:m +Text.Gigaparsec.Char
:m +Text.Gigaparsec.Token.Descriptions
:m +Text.Gigaparsec.Token.Lexer
lexer' = mkLexer plain
lexeme' = lexeme lexer'
sym' = sym lexeme'
fully' = fully $ mkLexer plain
p2 = many (char '1')
p1 = sym' "[" *> p2 <* sym' "]"
parseRepl (fully' p1) "[11]1"

Observe output

(line 1, column 5):
  unexpected "1"
  expected "1" or end of input
  >[11]1

Expected output should not contain expected "1".

Seems to related to how whiteSpace is handled, after replacing

https://github.com/j-mie6/gigaparsec/blob/79736ee2bf83a921aefb98b836bd657b99004b6d/src/Text/Gigaparsec/Internal/Token/Lexer.hs#L68

with apply = id, the issue disappears.

j-mie6 commented 7 months ago

Actually, this isn't to do with the lexer at all. It feels reminiscent of https://github.com/j-mie6/parsley/issues/167, which also materialised with whitespace, and also with square brackets. But with an entirely different parsing architecture, that's rather surprising!

I'll take a look shortly and minimise it to something that doesn't involve the lexer

j-mie6 commented 7 months ago

Ok:

ghci> parseRepl (optional digit <* char ']' <* hide (optional letter) <* eof) "]1"
(line 1, column 2):
  unexpected "1"
  expected digit or end of input
  >]1
    ^

Turns out the hide combinator is the required bit to make this break; thanks!

j-mie6 commented 7 months ago
good' x st' = good x st' { Internal.hints = Internal.hints st } -- TODO: should this change valid offset?

I believe the answer is categorically: yes, lol