The problem is that the last ngram for which adjusted counts were computed had the wrong count.
I generated a bunch of texts, ngram orders and pruning thresholds and compared with this python script
compute_discounts.txt
Out of 100 texts, with this patch 79 texts are rejected by both lmplz and the attached python script and for 21 I get the same discounts.
Without this patch 78 texts are rejected by both lmplz and my script, 1 is rejected by my script but not lmplz and 2 are rejected by lmplz but not my script. Among the texts for which discounts are computed, there's agreement between lmplz and my script in 17 cases and for 2 they are different.
I think this fixes a problem with the way ngrams are counted that's described in https://github.com/kpu/kenlm/issues/405.
The problem is that the last ngram for which adjusted counts were computed had the wrong count. I generated a bunch of texts, ngram orders and pruning thresholds and compared with this python script compute_discounts.txt
Out of 100 texts, with this patch 79 texts are rejected by both lmplz and the attached python script and for 21 I get the same discounts.
Without this patch 78 texts are rejected by both lmplz and my script, 1 is rejected by my script but not lmplz and 2 are rejected by lmplz but not my script. Among the texts for which discounts are computed, there's agreement between lmplz and my script in 17 cases and for 2 they are different.
Should I add my test data here too?