invertedtomato / packing

Library for encoding integers in the minimal number of bits. Includes VLQ and Elias Omega encoding.
MIT License
76 stars 6 forks source link

Must not allocated on heap. Guess bitmask over ulong could be used #10

Closed dzmitry-lahoda closed 5 years ago

dzmitry-lahoda commented 5 years ago

https://github.com/invertedtomato/integer-compression/blob/8b0b80f1e74340b90cc9726c1bf944dbca93b24d/Library/Compression/Integers/Wave3/FibonacciCodec.cs#L75

dzmitry-lahoda commented 5 years ago

I have made this https://github.com/dzmitry-lahoda/NetStack/blob/4b750853edb6e4fa28d2d23b7fc22fe55bc2f702/Assets/Scripts/src/NetStack.Serialization/Coder.cs#L71 Will improve so. Seems overall performance of you lib will be better for large arrays of numbers. My is more targeted for heteregoneus structs (e.g. per number). Also will improve instead of reading bit by bit into byte reader.

invertedtomato commented 5 years ago

I'd love to keep an eye on how you improve it. I have never felt happy with that Fib implementation - it's always felt like there must be a better way.

I do like how you're using BitOperations. I guessed that it would be a little slower - is it?

dzmitry-lahoda commented 5 years ago

For my case allocating array (as I do per number, not per array) would be definitely worse possible solution. In your case may be array will be faster.

I guess I may try to use stackalloc. But we have a price of alloc and clean here. So may be bitop is slower then byteop, but because of initialization and cleanup - they may be on par.

I your case you may move alloc out of loop to share for all numbers and cleanup. But producting GC object for each object seems no way.

Or can do ThreadStatic alloc of array.

I will test and provide numbers to chose best option (I have not thought about stackalloc earlier).

dzmitry-lahoda commented 5 years ago

I use BitOperations.ExtractBit for writing output. But that will be replaced with byte byte bit bit bit.

dzmitry-lahoda commented 5 years ago

https://gitlab.com/dzmitry-lahoda/dotnet-csharp-patterns/tree/master/sequential-bitmask

So best option is custom bit set and byte reversal from mask. So better have bitmask and reverse it than byte array and produce its reversed version (but I guess it can be optimized with copying bytes with on 8 times and shifted). So much room for optimization. I will choose no array version as it does no requires memory at all. Interestingly, but for u8 numbers, tables could be fastest option. I guess I target 64 bit devices for other improvements. Fibonacci is whole world...

dzmitry-lahoda commented 5 years ago

u8 - table. u16 - cast into u32. u32 - u64 mask. u64 - a. stackallock with bit unrolled write loop for case when write many numbers once. b. (u64, u32) mask when write per number or no direct access(not desirable) to storage bit index. for 64 bit platforms. for 64bit platforms storage as byte array may appear wrong. I.e. u8[] should be replaced with u32[] so the (index, bitoffset) a maxed out. for 128 bit platforms (future target of big data) - seems u64 may have 128 not mask and storage could be u64.

invertedtomato commented 5 years ago

In your code above, you're using a UInt64 (map) as your buffer. It's nice and simple, but am I missing something? That won't work for encoding UInt64.MaxValue-1, as it needs 88bits of buffer space which won't fit in 64 bits.

dzmitry-lahoda commented 5 years ago

Yes. I know. That was for quick test. I have made that working in edge cases via https://github.com/ricksladkey/dirichlet-numerics , but guess fastest way to have custom operations on 64 bit unsigned integer and another one 32 or 64 bit (consider 96 bit pair as single mask). But in this case performance may be marginally better of array usage (or stackalloc array usage for GC free). Give in current code buffer is allocated once for whole loop and arrays passed as parameters are of big length (may be 10000+ elements) than allocation may be neglectable.