Open ghost opened 4 years ago
Can you provide some examples?
wèl suggests a lot, except the correct wel and wél e.g. éen ditto, should just be een and één. -Ze does not suggest - Ze In general, in the suggestion list rendered by the server, there are too many suggestions for short words.
From 5 letters up, suggestions are a lot better.
Can you see whether this is limited to words with special characters? Also, does LibreOffice (which uses hunspell) have the same issue?
Hunspell does not have this. (Made the .aff myself)
wèl suggests a lot, except the correct wel and wél
It suggests wel
for me, but wél
is indeed missing. At first sight, this might even be a bug in Morfologik? @jaumeortola knows that code a bit, do you have an idea? nl_NL.info
looks okay to me.
I dumped the file spelling/nl_NL.dict
, and "wél" is not there. When "wél" is added to nl/spelling/spelling.txt
, it appears as the third suggestion.
2.) Line 1, column 1, Rule ID: MORFOLOGIK_RULE_NL_NL
Message: Er is een mogelijke spelfout gevonden.
Suggestion: Wel; wel; wél; Bel; El; Gel; Hel; Nel; Pel; Tel; Wal; Weg; Well; Wiel; Wil; bel; cel; del; el; fel; gel; hel; kwel; rel; tel; vel; wal; we; web; wee; weg; wei; welk; wen; wet; wiel; wil; wol; Mel; Wee; wed; wek; Wei; Welt; Wen; Wol; pel; Weel; Weil; Wes; iel; Wehl; weel; wel-; Wely; sel; welp; welt; Wels; lel; nel; Sel; Wael; awel; Welp; Kel; kel; zwel; Jel; Oel; wes; Weyl; woel; Wey; Zel; wep; yel; Owel; wem; Welz; welf; Yel
wèl
^^^
And it is here: spelling/ignore.txt:wél
.
Which proves my point, because wel and Wel should be the only suggestions.
Which proves my point, because wel and Wel should be the only suggestions.
We don't have a logic yet to stop when good suggestions are found. The algorithm will just keep searching for more candidates. It's on the wishlist, though.
I was confused because you said: "except the correct wel and wél". I understood "wél" was correct.
When the difference is only a diacritic mark and fsa.dict.speller.ignore-diacritics=true
, then distance=0 and the word comes first in the suggestion list.
If you want to cut the list, when the difference is only a diacritic mark, that is trivial. We could add that condition in MorfologikDutchSpellerRule.
Anyway, the long list is seen only by developers. Usually users only see 5 suggestions (depending on the user interface).
In the current spelling configuration, it seems that, even when there are many suggestions with distance=1, we keep searching for suggestions with distance=2. Many times it seems unnecessary. Is that the desired behavior, @danielnaber?
we keep searching for suggestions with distance=2.
Do you mean this code in MorfologikSpellerRule
?
if (word.length() >= 3 && (onlyCaseDiffers || fullResults || defaultSuggestions.isEmpty())) {
Stopping early is good for performance, but the suggestions with a larger distance might still be good. They might even be moved to top in a later re-sorting step (currently only for English). So changing the code would need quite some testing to prevent regressions, I think.
For shorter words, a manual list might be better.
For shorter mistakes, the list of suggestions by the morfologik speller are too many and far off. How can I improve this? The method I know would be to move them to the ignore list, and make a simple replace with the correct suggestions. Is that the preferred method?