Closed mitiko closed 3 years ago
This was actually very important! Turns out I had a bug in the matching code. The "feature" I called greediness turned out to be a bug, where certain words won't be counted correctly. Instead, we have to add all length words to the stack.
Ok, this pretty much fixed it, I only had the small error of updating the model before checking if this is the last word (rank > 0)
The order-0 entropy calculation in the ranking doesn't match what the parsed stream is measured at later. This is true both for order-0 and order-1 rankings.
There could be a mistake in the later measurement I think as I wrote it way back earlier. Regardless this should be investigated.