sillsdev / machine.py

Machine is a natural language processing library for Python that is focused on providing tools for processing resource-poor languages.
MIT License
10 stars 2 forks source link

normalize lines before getting charset #104

Closed mshannon-sil closed 6 months ago

mshannon-sil commented 7 months ago

In find_missing_characters(), we are currently normalizing after gathering the set of all characters, but we should instead be normalizing text before using it to update the set of all characters. Same issue as https://github.com/sillsdev/silnlp/issues/352