vigna / dsi-bitstream-rs

A Rust implementation of read/write bit streams supporting several types of instantaneous codes
Apache License 2.0
7 stars 3 forks source link

Extended const prefix codes for the 10-20 range #14

Closed LucaCappelletti94 closed 2 weeks ago

LucaCappelletti94 commented 3 weeks ago

As per the title, I have added missing const prefix codes (e.g. GOLOMB 17) as they turned out to be the best performers for my case and I need them to be const so to use them as trait-associated constants.

vigna commented 3 weeks ago

It's Sunday. I'm having a mojito on a beach at sunset. I'll delay my answer.

Sorry not sorry. 😅

LucaCappelletti94 commented 3 weeks ago

FYI, the optimal codes turned out to be in the Rice<20-35> range, so even over the 10-20 range, but I believe it may be overkill to add so many constants. Maybe the default constants for the CodesStats should be higher? I almost lost the best code parametrization because I did not think to increase the maximal possible range to search.

vigna commented 3 weeks ago

You sure Rice is faster with tables? It is pretty fast in itself and you'll be polluting the cache.

Maybe we should add an external mechanism for tables. Let me think...

LucaCappelletti94 commented 3 weeks ago

I am not talking about tables or time requirements - I meant better in terms of the average hash gap size.

vigna commented 3 weeks ago

Ahh ok the gain only derives from calling with a constant. And you want them to be constant. I thought you were implementing table (indeed, you didn't write anything about tables).

zommiommy commented 3 weeks ago

He is saying that rice 30 fits best his data distribution, but the CodeStats struct only tests up to rice 10, unless you change the default const usize.

It's just a default, the user can configure it as he wants, I think the discussion is about whats the most reasonable default value.

vigna commented 3 weeks ago

No, the PR adds several new constants to denote a specific instance of a parametric code. I think it's a reasonable use case and I would merge.