Entropy change calculation is wrong for words with repeating symbols

If the character 'a' appears twice in a word, the simple loop will calculate the entropy change as 2cx log(cx) - 2(cx-cw) log(cx-cw) when in fact it should be cx log(cx) - 2(cx-cw) log(2(cx-cw))

This is definitely a slowdown. I suppose we can create a dictionary. Then if the count matches the word length, we're fine, otherwise, we add a term to the rank based on the occurrence count of each character. Or, since we always create a dictionary we can work over that easily.

mitiko / BWDPerf

Entropy change calculation is wrong for words with repeating symbols #35