uncons :: ByteString -> Maybe (Word8, ByteString)
uncons Empty = Nothing
uncons (Chunk c cs)
= Just (S.unsafeHead c,
if S.length c == 1 then cs else Chunk (S.unsafeTail c) cs)
Both components of the returned tuple internally perform a small amount of work before constructing a result:
The first component has to read from the underlying buffer before it can construct its I# result. Since the compiler doesn't know that this read should not fail, it cannot perform the read eagerly; it must create a thunk. (The StrictByteString version of uncons has the same issue here.)
The second component has to actually perform the if-then-else before either evaluating cs or allocating a new Chunk object. This is just an comparison between evaluated Ints, and certainly succeeds in little time, so a compiler would be well within its rights to eagerly compare S.length c with 1, but GHC today does not do so for what-ever reason.
What should we do about this?
We could make uncons strict in S.unsafeHead c, but then the resulting readWord8OffAddr# read-effect operations will stick around even if the result is un-used, which is a bit unfortunate. (Maybe Cmm assignment sinking or the register allocator can clean it up very late in the pipeline. I haven't checked.) And users pattern-matching on the result of uncons have an easy work-around: Use a bang-pattern themselves. So I'm not convinced this behavior should be changed before GHC starts to discard unused read-effects in its simplifier.
I think we definitely should perform the S.length check eagerly: Since the thunk that would be allocated is one word larger than the Chunk cell allocated in the else branch, I expect eagerly performing this comparison to be cheaper on average even if the thunk would never be evaluated. And users pattern-matching on the result of uncons have no comfortable workaround, since a bang-pattern may force the lazy tail prematurely. I will put up a merge request implementing this momentarily.
The problem with the second component of Lazy.uncons was fixed in #559. But the first component of the result of uncons is still a thunk, so I will re-open.
Consider
Lazy.uncons
. Today, we have inData.ByteString.Lazy
:Both components of the returned tuple internally perform a small amount of work before constructing a result:
I#
result. Since the compiler doesn't know that this read should not fail, it cannot perform the read eagerly; it must create a thunk. (TheStrictByteString
version ofuncons
has the same issue here.)cs
or allocating a newChunk
object. This is just an comparison between evaluatedInt
s, and certainly succeeds in little time, so a compiler would be well within its rights to eagerly compareS.length c
with1
, but GHC today does not do so for what-ever reason.What should we do about this?
uncons
strict inS.unsafeHead c
, but then the resultingreadWord8OffAddr#
read-effect operations will stick around even if the result is un-used, which is a bit unfortunate. (Maybe Cmm assignment sinking or the register allocator can clean it up very late in the pipeline. I haven't checked.) And users pattern-matching on the result ofuncons
have an easy work-around: Use a bang-pattern themselves. So I'm not convinced this behavior should be changed before GHC starts to discard unused read-effects in its simplifier.S.length
check eagerly: Since the thunk that would be allocated is one word larger than theChunk
cell allocated in the else branch, I expect eagerly performing this comparison to be cheaper on average even if the thunk would never be evaluated. And users pattern-matching on the result ofuncons
have no comfortable workaround, since a bang-pattern may force the lazy tail prematurely. I will put up a merge request implementing this momentarily.