Reduce peak memory consumtion during encoding

daniel-j-h commented 3 years ago

We don't know the required output memory upfront, so we use a function returning the worst case memory required

https://github.com/lemire/streamvbyte/blob/635d1c5ea63a1304762bba3c3e2e1154e9c83348/include/streamvbyte.h#L28-L35

but in case we are encoding small integers (or small deltas), often times most if not all values fit into a single byte.

In these cases, we still need to allocate upfront

control bytes + n * 4

whereas

control bytes + n * 1

bytes would suffice.

There are use cases where I'd like to only allocate e.g. 1 GB instead of 4 GB an then throwing out 3 GB immediately after encoding.

Should this library provide a two-pass approach, where

the user first calls a function to determine the allocation required
the user then calls a function to encode the input data

This two-pass approach might be slower in terms of runtime, but we can reduce the allocations required for data bytes by a factor of four in the best case.

Users can write their own version (summing up the bytes required per input item) but having a function in the library would be great for convenience and would allow efficient implementations in the future. Thoughts?

lemire commented 3 years ago

@daniel-j-h

Pull requests invited!

I would discourage you from processing data in blocks of gigabytes. Consider breaking down the data so that you can work in cache.
A large (4GB) virtual memory allocation is very fast. Try to benchmark "malloc(large value)". You should find that time elapsed is independent from the allocation size. Most systems will only allocate real memory when you access a page.
Importantly you do not want each data structure to be independently allocated: I recommend an arena approach were you allocate a chunk for most of your needs (containing many data structures). Allocating physical memory is really slow and you don’t want to keep doing it.

daniel-j-h commented 3 years ago

Small pull request at https://github.com/lemire/streamvbyte/pull/33

I appreciate your detailed response :raised_hands: I was looking into https://github.com/iiSeymour/pystreamvbyte to play around and in there we create a np.empty based on the estimated max compressed bytes. The numpy array always gets allocated (just not initialized) - that's why I thought computing the exact number of bytes required would be great to have in the C version here to begin with.

lemire / streamvbyte

Reduce peak memory consumtion during encoding #32