haskell / attoparsec

A fast Haskell library for parsing ByteStrings
http://hackage.haskell.org/package/attoparsec
Other
514 stars 93 forks source link

Add takeWhileN, takeWhileN1, decimalN #191

Closed mulderr closed 2 years ago

mulderr commented 2 years ago

This adds alternative versions of takeWhile, takeWhile1 and decimal that match and consume no more than n bytes of input.

With types like ByteString we can B.take n . B.takeWhile p and that should be a sensible operation (?). However, the same will not work as expected for Parser ByteString because B.take n <$> AB.takeWhile p will also modify parser state, possibly advancing it more than n bytes. These variants are thus proposed for situations where we want to match no more than n bytes.

As to decimal, it is rather ill defined and very easy to use unsafely as it does no bounds checking whatsoever. I've added a warning copied from an analogous function in text which states that for bounded types it can easily overflow. As obvious as it may seem, people are still using it without sanity checks in the wild - I've found two packages on hackage (which I will not name here; also I have not conducted a thorough search so there could be more!) that will successfully parse invalid input due to overflows caused by careless usage of decimal. I propose a safer alternative decimalN :: Integral a => Int -> Parser a that matches at most the given number of digits - building on takeWhile1N.

I've thought about decimalBounded :: (Integral a, Bounded a) => Parser a but am unsure how to write an efficient implementation yet so I'll leave that for later.

decimal is fast but only safe to use on "trusted inputs" so I would additionally suggest to:

but I leave that decision to others as it could cause breakage also in scenarios where it is currently used safely.

I'm happy to entertain any suggestions including name bikeshedding.