Inconsistent language-code strings lead to inconsistent normalization

This has apparently been the case for a while, but we should fix it in an update:

The tokenize function assumes it's getting a nicely-normalized language code. But when looking up word frequencies, we don't actually normalize the language code until later, and we do it inside get_frequency_list without returning it.

I can think of an ugly fix we could make right away, or a nice fix that would require a change to langcodes to make simple cases of language matching faster.

rspeer / wordfreq

Inconsistent language-code strings lead to inconsistent normalization #36