lines and bytestring? - Githubissues

snoyberg / classy-prelude

A typeclass-based Prelude.

108 stars 15 forks source link

lines and bytestring? #86

Closed gibiansky closed 9 years ago

gibiansky commented 9 years ago

I would like to be able to use the lines function on ByteStrings. Is there a fundamental reason this shouldn't be possible, besides the current organization of the code?

It is a bad idea to have ByteString implement Textual, since things like toLower and toUpper don't make sense. However, splitting on spaces (words) and newlines (lines) seems like might still make sense.

Alternatively you can use split from the Data.ByteString module, so this is more of a question rather than a feature request or suggestion, I guess.

snoyberg commented 9 years ago

The reason is that it's semantically incorrect: we have no information on the character encoding of a ByteString, and therefore may do something completely incorrect (e.g., if the ByteString is UTF16 encoded).

What I did in conduit-combinators was provide two separate functions: line and lineAscii. I think something like that would make sense here as well.

gibiansky commented 9 years ago

Ah, fair enough. That makes sense.

I am on some fairly hacky parsers (not really meant to be production-ready, so don't care too much about details like encodings, I guess) with classy-prelude and attoparsec, and finding that it's very convenient that many classy-prelude functions are overloaded to work on ByteStrings (such as readFile). However, I've seen a lot of the following patterns forming:

space = fromIntegral $ Char.ord ' '
noSpaces = filter (/= space) myBytestring

newline = fromIntegral $ Char.ord '\n'
theLines = split newline myBytestring

Do you think there is a good way to make classy-prelude a little bit nicer for ByteString work? For example, provide ord and chr from Data.Char by default (though perhaps with the type signature ord :: Integral a => Int -> a or something?). Do you think that would be worth the added exports?

snoyberg commented 9 years ago

You may want to look at the word8 package, which provides helper values like this.

I'm not opposed to exposing ord and chr, but what about toEnum and fromEnum?

gibiansky commented 9 years ago

I was not aware of the word8 package; that seems to be pretty useful and would help.

Thanks for the responses; ord and chr seem like they might be useful functions to export, but my opinions on those aren't too strong. Ultimately import Data.Char isn't a big hassle. A point towards them though is that they're useful and fairly common; also, Python has them imported by default a lot, and I definitely use them there occasionally.

Anyway, closing this issue since I'm not sure there's anything actionable here.