haskell-streaming / streaming-bytestring

effectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy
BSD 3-Clause "New" or "Revised" License
16 stars 11 forks source link

Word8 and Char8 `uncons` isssues #40

Closed vdukhovni closed 3 years ago

vdukhovni commented 4 years ago

The uncons implementation for Streaming.ByteString has a somewhat unfortunate choice of return type that loses the stream's terminal value when it is empty. It is also unsafe, if the first chunk of the stream is an empty ByteString, it will crash (this defect is addressed in the nextByte function below):

-- | /O(1)/ Extract the head and tail of a 'ByteStream', or 'Nothing' if it is
-- empty.
uncons :: Monad m => ByteStream m r -> m (Maybe (Word8, ByteStream m r))
uncons (Empty _) = return Nothing
uncons (Chunk c cs)
    = return $ Just (B.unsafeHead c
                     , if B.length c == 1
                         then cs
                         else Chunk (B.unsafeTail c) cs )
uncons (Go m) = m >>= uncons

to make up for this deficit, there a more natural variant named nextByte

-- | /O(1)/ Extract the head and tail of a 'ByteStream', or its return value if
-- it is empty. This is the \'natural\' uncons for an effectful byte stream.
nextByte :: Monad m => ByteStream m r -> m (Either r (Word8, ByteStream m r))
nextByte (Empty r) = return (Left r)
nextByte (Chunk c cs)
    = if B.null c
        then nextByte cs
        else return $ Right (B.unsafeHead c
                     , if B.length c == 1
                         then cs
                         else Chunk (B.unsafeTail c) cs )
nextByte (Go m) = m >>= nextByte

These have variants in Streaming.ByteString.Char8 named uncons and nextChar, defined and documented as shown below, but note that the haddock documentation of this uncons is wrong, it does not return a Maybe, rather it is in fact almost functionally identical to nextChar, except that it reproduces the potential crash in the Word8 uncons when the first chunk is empty:

-- | /O(1)/ Extract the head and tail of a ByteStream, returning Nothing
-- if it is empty.
uncons :: Monad m => ByteStream m r -> m (Either r (Char, ByteStream m r))
uncons (Empty r) = return (Left r)
uncons (Chunk c cs)
    = return $ Right (w2c (B.unsafeHead c)
                     , if B.length c == 1
                         then cs
                         else Chunk (B.unsafeTail c) cs )
uncons (Go m) = m >>= uncons

-- | /O(1)/ Extract the head and tail of a 'ByteStream', or its return value if
-- it is empty. This is the \'natural\' uncons for an effectful byte stream.
nextChar :: Monad m => ByteStream m r -> m (Either r (Char, ByteStream m r))
nextChar b = do
  e <- Q.nextByte b
  case e of
    Left r       -> return $! Left r
    Right (w,bs) -> return $! Right (w2c w, bs)

What should we do? At the very least the crashes should be fixed, but which is the correct uncons type? Should they remain unexpectedly different? At the very least the documentation should reflect reality...

fosskers commented 3 years ago

uncons having a Maybe in its return type mirrors the standard library, but in this case I think the two functions should be merged into a single one named uncons who returns an Either and avoids the crashes. That would be another breaking change.

vdukhovni commented 3 years ago

So I think you're suggesting to rename nextByte to uncons (deleting the original), and either rename nextChar to uncons, or clone the Word8 uncons adding a w2c conversion to the return value. The indirection via such a small helper is not especially useful, and the functions are recursive, so don't inline. (We could move the recursion into a where go = ... and just have the exposed function call go, which would make inlining possible, and is perhaps the right thing to do...).

vdukhovni commented 3 years ago

Should addressing this go into #42 or a new separate PR?

fosskers commented 3 years ago

New PR, I'd say.

vdukhovni commented 3 years ago

Speaking of uncons there's another related case:

unconsChunk :: Monad m => ByteStream m r -> m (Maybe (B.ByteString, ByteStream m r))
nextChunk :: Monad m => ByteStream m r -> m (Either r (B.ByteString, ByteStream m r))

Again the first is redundant, the second is the more appropriate interface. I don't know what your thoughts are about keeping both, or likewise keeping just nextChunk renamed to uncons, or ...

fosskers commented 3 years ago

Let's rename those as well.

vdukhovni commented 3 years ago

It is simpler for me to add these to #42, or else wait until #42 is merged and then open a new PR. I prefer the former, but if a new PR is needed, I'll wait for #42 to land.

fosskers commented 3 years ago

Adding to #42 should be fine.