Open lemolatoon opened 2 months ago
Hi @lemolatoon! It's been a while since I've worked on this crate so I don't recall for sure, but I think I may have just misread the paper and used 6 bits instead of 5 when implementing it 🤦
@jeromefroe Thanks for your reply! I understand the circumstance. I could write some patches to fix this, but it would be a big breaking change. I will keep it. If you think this should be fixed, I'll send a Pull Request, so I would like you to let me know.
If you want to fix it, I'd be more than happy to accept the pull request. It would just have to be released as a new version since as you noted it would be a breaking change.
I was exercising around the Gorilla Compression. I found that If we have more than or equal to 32 bits of leading zeros, we cannot encode it in 5 bits. In that case, we should treat it as 31 bits of leading zeros, and the rest of zeroes as significant bits (in other words, meaningful bits). I found another implementation of Gorilla, and they do like that.
I an thinking about writing patches and open pull requests for this issue when I am free.
Thank you!
I found in this crate, the number of leading zeros is encoded in 6 bits where the control bit is '11'. However, according to Gorilla's paper, it should be encoded in 5 bits.
The corresponding codes are here. https://github.com/jeromefroe/tsz-rs/blob/b3e2dce64707c42c10019d9159f88a0f594458af/src/encode/std_encoder.rs#L132-L138 https://github.com/jeromefroe/tsz-rs/blob/b3e2dce64707c42c10019d9159f88a0f594458af/src/decode/std_decoder.rs#L143
Is there a specific reason to use 6 bits for the leasing zeros in this crate?