mitiko / BWDPerf

BWD stands for Best Word Dictionary as it has the ability to be an optimal dictionary coder.
https://mitiko.github.io/BWDPerf
GNU General Public License v3.0
0 stars 1 forks source link

Optimize the lcp matchfinder and add infrastructure for nibblewise and bitwise encoding #38

Closed mitiko closed 3 years ago

mitiko commented 3 years ago

The lcp matchfinder was optimized by reusing the calculated matches and stoing them in a skip list. The skip list is updated, thus reducing the matches we have to rank each time.

I also experimented with storing the counts in the match struct and recounting but the skip list search was too slow and the search itself isn't quite easy. There still is some hope and definitely much more potential with the skip list but for now I'm leaving it as is. Adding a count and sa count in the match struct is also a bit unreasonable and confusing.

The other major change is the ability to do nibblewise and bitwise encoding. I initially discarded the idea as too complicated for the project but I changed my mind as I wanted to try nibblewise prediction for the GDCC. Fun fact: nibblewise coding is usually faster than bytewise because of the faster cdf calculation and less memory usage.