zverok / spylls

Pure Python spell-checker, (almost) full port of Hunspell
https://spylls.readthedocs.io
Mozilla Public License 2.0
282 stars 18 forks source link

Better report for error in aff; fixes #14 #15

Open zdenop opened 2 years ago

zverok commented 2 years ago

I believe it should be handled differently. As far as I can understand, the error is caused by the wrong number of items in the array (e.g. there is a declartion SFX 10, and then only 9 suffixes, something like this, right?) It would be much more reliable to catch it exactly when we know what's the problem is: e.g. in _read_array. It even has a TODO comment for that, which I sadly never acted upon :shrug:

At this point, we'll support not only wrongly formatted SFX/PFX, but all array directives, and can provide much more helpful reporting, like "Error reading directive SOME_DIRECTIVE: expected 10 values, but only 8 found" or something like that.

zdenop commented 2 years ago

e.g. there is a declartion SFX 10, and then only 9 suffixes, something like this, right?)

My situation was the opposite: there were more rules than number in declaration.

zverok commented 2 years ago

My situation was the opposite: there were more rules than number in declaration.

Aha, then it is a bit more complicated, but the right handling, again, should be done beforehand. If you'll have this situation:

PFX A Y 1                     # declares there will be one
PFX A   0     re         .    # this one is OK
PFX A   0     in         .    # this one is extra

Then at the line 3 Spylls trying to assume it is a declaration of a new affix, which is wrong. What can be done here is probably seeing if the PFX/SFX directive content matches the {flag} (Y|N) {digits} pattern somewhere here: https://github.com/zverok/spylls/blob/master/spylls/hunspell/readers/aff.py#L248 Then ["A", "Y", "1"] would match, while ["A", "0", "in", "."] would not, and can be properly reported.

WDYT?

(Another, and more complicated solution, would be trying to "peek" in the next line in _read_array, and see if it is erroneously still a continuation of the array... But that's much more work)

zdenop commented 2 years ago

I prefer fast fixes / more iteration (based on error reposts), as creating a complex fix usually needs time that is not available ;-) So I am ok with whatever fix you are able to make today :- )