Open oberblastmeister opened 2 years ago
I'm reluctant to extend bytestring
API even further, it's already difficult to maintain in sync and there is no particular reason for SlicedShortByteString
to be here, since it is not used by other boot libraries.
I think pushing forward https://github.com/haskell/primitive/pull/354 and #410 is a more sustainable approach: when merged, byteslice
will operate over slices of ShortByteString
.
In my opinion implementing functions directly on ShortByteString
is not optimal. Functions on ShortByteString
can be implemented trivially by using slices and should have no overhead because of inlining and simplification. Also, the functions currently copy by default which I think should be more explicit, such as by copying the sliced variant. In theory we could move all the functions from ShortByteString
to the sliced variant and have conversion functions for people that want to use ShortByteString
, and it would be okay. For example the any
function can be easily converted for slices
any :: (Word8 -> Bool) -> Slice -> Bool
any k = \Slice {ba, off, len} ->
let
w = indexWord8Array ba
go !n | n >= len = False
| otherwise = k (w n) || go (n + 1)
in go off
and now any
for ShortByteString
is easy
any :: (Word8 -> Bool) -> ShortByteString -> Bool
any f = Slice.any f . asSlice
So most of the code is just moved instead of duplicated, and I don't think it would be that much harder to maintain, but of course that is your decision. I think adding this type would help unify vector
, bytestring
, and text
because ByteArray
can be cheaply converted between them. This would also encourage people to use ByteArray
based types instead of ByteString
which always has to be pinned.
@oberblastmeister that's understandable goal, but as long as ByteArray
from base
is shared by all core packages, someone can implement Slice
outside of bytestring
. I agree that this is likely to cause certain code duplication, but at the moment I do not have bandwidth to include Slice
into bytestring
itself, at least not without seeing a prior art elsewhere.
@sjakobi what do you think?
I went looking through filepath
as that is the primary consumer of ShortByteString
. A large amount of functions are ineficient from copying. The most egregious example is splitDirectories
. It would be pretty funny if the new ByteArray
based functions were slower than the String
based ones. On Windows, it is, by $13 \mu s$! The String
version is almost 2 times faster than the ByteArray
version! On posix, the situation isn't much better, with splitDirectories
taking $7 \mu s$. Windows functions seem to suffer much more from copying because of the larger path size. A quick analysis of splitDirectores
gives 112 total copies. The benchmark uses a path with 28 segments, splitPath
does 3 copies for each segment and dropTrailingPathSeparator
does about 1 also. And it is even worse, splitPath
copies the accumulator. This makes the effect of copies even larger, as we are basically getting $n^2$ copying behavior. With slices, there would be zero copies. Clearly, we need slices to improve performance.
Well, Slice
adds 16 byte overhead for each chunk, which in a typical use case of filepath
is worse than copying content. Your benchmark with 28 segments looks a bit contrived to me, and FWIW splitDirectories
can use (Int, Int, ShortByteString)
internally to prevent quadratic behaviour, and copy only resulting chunks, not accumulator.
More generally, filepath
can define data Slice = Slice Int Int ShortByteString
and use it internally to optimize intermediate operations. I think this is the best balance of cost/benefit right now. If it works well, in future we can discuss this issue further.
If the cost of copying some bytearray is significant, it usually makes sense to tell the GC not to copy it, by pinning that bytearray. So I don't find the idea of sliced-unpinned-bytestring all that persuasive on the face of it. (Of course, ByteString
is typically still a bit more memory-expensive than simple sliced-pinned-bytearrays would be because its representation involves an extra two-word PlainPtr
heap object. But I hope to eventually improve GHC so that PlainPtr
does not actually allocate a new heap object.)
But in many cases I expect the observed "copying cost" to actually be dominated by the allocation cost of the buffers in which the copies are placed. And unlike copying in itself this is unaffected by pinnedness. The allocation overhead for String
is (with the obvious exceptions) truly horrible: So horrible that for typical or even somewhat-large filepath inputs it doesn't surprise me that $O(n)$ string vs $O(n^2)$ shortbytestring allocations the shortbytestrings can still perform comparably or better.
I've been thinking for a while about removing most of the near-duplication between our ByteString
and ShortByteString
code, by writing both with identical combinators at different types. I expect that doing so would make it trivial to add an unpinned-slices variant.
Your benchmark with 28 segments looks a bit contrived to me
It's from filepath
.
More generally,
filepath
can definedata Slice = Slice Int Int ShortByteString
and use it internally to optimize intermediate operations. I think this is the best balance of cost/benefit right now. If it works well, in future we can discuss this issue further.
Sounds good.
I've been thinking for a while about removing most of the near-duplication between our
ByteString
andShortByteString
code, by writing both with identical combinators at different types. I expect that doing so would make it trivial to add an unpinned-slices variant.
How would you do this? Something like type classes?
I've been thinking for a while about removing most of the near-duplication between our
ByteString
andShortByteString
code, by writing both with identical combinators at different types. I expect that doing so would make it trivial to add an unpinned-slices variant.How would you do this? Something like type classes?
Type classes can do this, but then un-specialized versions of every function will be generated even though they should never be used. Backpack is perhaps the "correct" tool for the job, but it's not available with the older versions of GHC and Cabal still supported by bytestring, and would also probably require import cycles to use in this way. So I had in mind to use CPP to share the implementations, if/when I get around to this.
More generally, filepath can define data Slice = Slice Int Int ShortByteString and use it internally to optimize intermediate operations. I think this is the best balance of cost/benefit right now. If it works well, in future we can discuss this issue further.
Sounds good.
Yes, I've already thought about that. PRs welcome. But I don't really want to re-implement byteslice package.
I think we could take some inspiration from byteslice. This would also address https://github.com/haskell/bytestring/issues/193, by expanding the api for
ByteArray
based types instead of changingByteString
to useByteArray
.The docs for
ShortByteString
sayHaving a sliced
ShortByteString
would allow using aByteArray
based type in public APIs, similar toByteString
.