haskell / attoparsec

A fast Haskell library for parsing ByteStrings
http://hackage.haskell.org/package/attoparsec
Other
512 stars 93 forks source link

notInClass seems to handle UTF-8 needlessly #152

Open flip111 opened 5 years ago

flip111 commented 5 years ago

https://hackage.haskell.org/package/attoparsec-0.13.2.2/docs/Data-Attoparsec-ByteString.html#v:notInClass

When looking at core i saw the following:

        (notInClass
           (ghc-prim-0.5.3:GHC.CString.unpackCStringUtf8#
              "some string here"#)
           w_a3cX)

The unpackCStringUtf8 is not in notInClass itself, but seems to be the result from notInClass using String as input. Also i doubt whether a list type like String is the best choice here.

bgamari commented 5 years ago

Indeed, this sounds like some nice low-hanging fruit. It would be good to introduce a RULE to rewrite notInClass (unpackCStringUtf8# "..."#) into something a bit more sensible.