haskell / attoparsec

A fast Haskell library for parsing ByteStrings
http://hackage.haskell.org/package/attoparsec
Other
512 stars 93 forks source link

decimal does not range-check #211

Open jchia opened 2 years ago

jchia commented 2 years ago

I just realized that decimal does not check for out-of-range values. https://hackage.haskell.org/package/attoparsec-0.14.4/docs/src/Data.Attoparsec.ByteString.Char8.html#decimal

ghci> parseOnly (decimal @Word16) "65536"
Right 0

This was surprising to me, but only before considering the type constraint of Integral without Bounded, implying that range check is impossible.

Still, from the user's perspective, doing the appropriate range checks somewhere is necessary, if not by the library then by the user. If done by the user, he can only parse a Natural and the Word16 specialization becomes useless/unused.

Currently, this behavior carries over to cassava, which also fails to do the necessary range-check.

dminuoso commented 2 years ago

Ugh!

I'm torn whether this is a bug.

While at second glance it's obvious this type signature does not admit bounds checking, the average user might trivially assume w8 <- decimal to do the obvious thing and fail on decimal strings larger than 255.

In my opinion decimal should be monomorphized to Parser Integer, then it's more clear this parser will parse a number of any size, and its up to the user to use fromIntegral (where it's much clearer that wraparound occurs) or do bounds checking.

jchia commented 2 years ago

Having a range-checking parser for bounded integral types means that excessively-long input can be rejected early before running out of memory from trying to parse an Integer or Natural. This can be a good reason to have a range-checking parser for bounded integral types that do not need to parse an Integer or Natural first.