Ranking's entropy estimation and the following measurement don't match

mitiko / BWDPerf

BWD stands for Best Word Dictionary as it has the ability to be an optimal dictionary coder.

https://mitiko.github.io/BWDPerf

GNU General Public License v3.0

0 stars 1 forks source link

Ranking's entropy estimation and the following measurement don't match #33

Closed mitiko closed 3 years ago

mitiko commented 3 years ago

The order-0 entropy calculation in the ranking doesn't match what the parsed stream is measured at later. This is true both for order-0 and order-1 rankings.

There could be a mistake in the later measurement I think as I wrote it way back earlier. Regardless this should be investigated.

mitiko commented 3 years ago

This was actually very important! Turns out I had a bug in the matching code. The "feature" I called greediness turned out to be a bug, where certain words won't be counted correctly. Instead, we have to add all length words to the stack.

mitiko commented 3 years ago

Ok, this pretty much fixed it, I only had the small error of updating the model before checking if this is the last word (rank > 0)