haskell-streaming / streaming-bytestring

effectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy
BSD 3-Clause "New" or "Revised" License
16 stars 11 forks source link

ByteString -> ByteStream #30

Closed fosskers closed 3 years ago

fosskers commented 3 years ago

This PR implements a name change to the core ByteString type. This is to avoid the obvious conflict with the upstream type of the bytestring library, and hopefully avoid the confusion that many people feel when trying to read function signatures that mix the two types.

The modules have also had their names changed. Normally this would cause a major version bump, but I've also provided a pre-deprecated type alias for ByteString and module reexports under the old module names. This should help users migrate.

Suggestion: Six months after we release the upcoming version with the recent fixes/improvements, let's release a 0.2 that removes the deprecations introduced here.

I'll merge this PR on Monday, October 5th if there are no major objections. Perhaps #29 should go in first as well. Following the merging of these two PRs, I will make a new release of this library.

vdukhovni commented 3 years ago

I'll merge this PR on Monday, October 5th if there are no major objections. Perhaps #29 should go in first as well. Following the merging of these two PRs, I will make a new release of this library.

It would be nice to see a better readInt merged. But now you'll have to choose between #29 and #31 :-)

fosskers commented 3 years ago

Heh. You'd prefer #31, right? Either way, we can put yours in first, and then I can deal with any merge conflicts here.

vdukhovni commented 3 years ago

Suggestion: Six months after we release the upcoming version with the recent fixes/improvements, let's release a 0.2 that removes the deprecations introduced here.

As for deprecation within six months, that seems rather quick to me, the ecosystem generally moves at a slower pace, measured in years, but if streaming is not as widely used as bytestring and the like, or is used by more sophisticated users better able to keep, then so be it, I don't know what the users of streaming expect in terms of stability

vdukhovni commented 3 years ago

Heh. You'd prefer #31, right? Either way, we can put yours in first, and then I can deal with any merge conflicts here.

Well, I'm not sure which I really prefer. They have their pluses and minuses. The whitespace skip logic is cleaner in #31, and should be backported if the decision is to go with #29. The semantics of #29 are closer to bytestring (no overflow detection), and there is a modest performance cost to that.

Also getting good performance from either readInt seems to require inlining, and the code in #31 is larger, I am not sure whether that's a problem for anyone.

Mind you, I should redo the measurements without inlining. Now that readIntSkip is defined entirely within the module, rather than the caller providing a hook as in #29, perhaps performance will be fine regardless (assuming that monadic steps are sufficiently infrequent and most of the work happens on the strict chunk already in hand...).

vdukhovni commented 3 years ago

Mind you, I should redo the measurements without inlining. Now that readIntSkip is defined entirely within the module, rather than the caller providing a hook as in #29, perhaps performance will be fine regardless (assuming that monadic steps are sufficiently infrequent and most of the work happens on the strict chunk already in hand...).

I can make the readInt in #31 be INLINABLE rather than INLINE with largely the same performance with -O2. Without either it gets ~2x slower when reading a stream of space separated ints (bulk loading integer data).

vdukhovni commented 3 years ago

On the whole, to be honest, yes I do prefer #31.

vdukhovni commented 3 years ago

After a bit more benchmarking (now with the code integrated into streaming-bytestring, rather than as a separate module on the side), the winner is #31 with readInt INLINABLE rather than INLINE. Either outperform #29, especially on total memory allocations. So I think we have a winner. I'll just update the PRAGMA, and call it done for now, pending review...