jnweiger / pdfcompare

compare two PDF files, write a resulting PDF with highlighted changes
GNU General Public License v2.0
54 stars 15 forks source link

german umlaut characters break spell checking #12

Open jnweiger opened 8 years ago

jnweiger commented 8 years ago

Words with german umlaut characters e.g. 'Nürnberg' internally seen as "N\xfcrberg" do not make it into the word_set for h.check_word()

This is prevented by the regexp '([a-z_-]{3,})', but leads to encode/decode errors otherwise. We should try to guess an encoding and if successful convert to utf-8 for hunspell.