zverok / spylls

Pure Python spell-checker, (almost) full port of Hunspell
https://spylls.readthedocs.io
Mozilla Public License 2.0
284 stars 21 forks source link

aff-regex #18

Closed doublex closed 2 years ago

doublex commented 2 years ago

This AFF (czech) contains a wrong regex: https://github.com/wooorm/dictionaries/blob/main/dictionaries/cs/index.aff#L2119

Therefore this line fails re.error: unterminated character set at position 36 https://github.com/zverok/spylls/blob/master/spylls/hunspell/data/aff.py#L266

zverok commented 2 years ago

What are you suggesting here? What's the desired behavior for definitely-wrong dictionary files?

doublex commented 2 years ago

You are right - the problem is the affix file. But maybe there is an issue, this affix looks correct but fails: https://github.com/wooorm/dictionaries/blob/main/dictionaries/uk/index.aff#L1464

zverok commented 2 years ago

@doublex Ugh, this is more complicated. It seems I've never encountered dictionaries with () in conditions before, even when running smoke tests on all dictionaries that were available at the moment of spylls finalization (not even sure if Hunspell supports this syntax). I'll try to take a closer look in the next days.

doublex commented 2 years ago

They are a rare (strange?) case. Maybe simply remove ()?

zverok commented 2 years ago

Surprisingly enough, this case, while indeed rare, made me rethink a bit why it is a problem... And simplify code for it not be it anymore :) See https://github.com/zverok/spylls/commit/f92f74b47c99265554bd90c775957152f40cf4e1 — there are significant simplifications in spylls/hunspell/data/aff.py, dropping the hacky regexp construction. Released as 0.1.7, works with uk_UA as expected.

doublex commented 2 years ago

Thanks a lot for all your efforts!