Add SSE41 path for streamvbyte_compressedbytes.

ishitatsuyuki commented 1 year ago

I'm trying to use compressedbytes to get the allocation size beforehand but this turned out to be awfully slow and taking more time than just doing the (SIMD-accelerated) encoding itself.

This PR fixes that and does a few things:

It makes scalar compressedbytes branchless so that it's less awful.
It adds a SSE41 path for compressedbytes, reusing mask calculation from the encode path.
- Sorry, but there is no implementation for:
- ARM NEON (don't have a machine to test)
- 0124 path (don't have unit tests and the code structure is somewhat different).

lemire commented 1 year ago

This will be part of the next release.

lemire commented 1 year ago

I'll merge as soon as the tests are green.

ishitatsuyuki commented 1 year ago

Fixed the pragma errors from CI.

lemire commented 1 year ago

Let us re-run the tests.

lemire commented 1 year ago

Merging.

lemire / streamvbyte

Add SSE41 path for streamvbyte_compressedbytes. #57