If the character 'a' appears twice in a word, the simple loop will calculate the entropy change as
2cx log(cx) - 2(cx-cw) log(cx-cw)
when in fact it should be
cx log(cx) - 2(cx-cw) log(2(cx-cw))
This is definitely a slowdown. I suppose we can create a dictionary. Then if the count matches the word length, we're fine, otherwise, we add a term to the rank based on the occurrence count of each character.
Or, since we always create a dictionary we can work over that easily.
If the character 'a' appears twice in a word, the simple loop will calculate the entropy change as
2cx log(cx) - 2(cx-cw) log(cx-cw)
when in fact it should becx log(cx) - 2(cx-cw) log(2(cx-cw))
This is definitely a slowdown. I suppose we can create a dictionary. Then if the count matches the word length, we're fine, otherwise, we add a term to the rank based on the occurrence count of each character. Or, since we always create a dictionary we can work over that easily.