Closed ricardojosehlima closed 2 years ago
@jaumeortola @susanaboatto
It is the second time today (and yesterday?) that someone complains about typos not being showing as so.
Is there an issue with the speller checker?
Thanks!
There haven't been any changes recently. The hunspell dictionary we use simply seems to accept that word for pt-BR:
echo "inciais" | hunspell -a -d languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/hunspell/pt_BR
-> no error found
echo "inciais" | hunspell -a -d languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/hunspell/pt_PT
-> error found (same command but PT instead of BR)
"inciais" itself isn't in the pt_BR.dic file, so it must be generated via a flag.
@danielnaber this should mean that "inciais" is in the pt-BR dictionary located in this folder? I just downloaded it, searched for "inciais" and found nothing. On the other hand, in the neighborhood, I found "incetivo" (should be "incentivo") and "inciativa" (should be "iniciativa").
this should mean that "inciais" is in the pt-BR dictionary located in this folder?
Not necessarily directly. It can be generated from a different word by the flags (the characters after the /
).
The problem happens with the wrong verb *inciar and all its inflected forms.
But this verb is not at hunspell (I didn't find it there), right?
But this verb is not at hunspell (I didn't find it there), right?
Yes. It is very strange. "Inciar" is not in our hunspell pt_BR.dic file. But I can see "inciar" in a newer pt_BR Hunspell version.
Now I found the origin of the error. In our dictionary, "inciais" comes from "ciar" with prefix in- and inflection:
$ hunspell -d pt_BR
Hunspell 1.7.0
inciar
+ ciar
inciais
+ ciar
I talked about the potential problems with prefixes here: https://github.com/languagetool-org/languagetool/issues/6079 The dictionaries need a thorough review...
Ah ok, the verb "ciar" whatever it means is very rare in pt-br, can be excluded from the dictionary. As for "in-" as a prefix, I think it only applies to adjectives ("in+feliz", "in+sensível") and adverbs formed from these adjectives ("in+felizmente", "in+sensivelmente"), not to verbs, so the possibility of in + verbs could also be excluded, if possible.
Okay, I removed the verb 'ciar' (I did the same in Catalan with this verb some time ago).
This quarter, we plan to rebuild the Portuguese spelling dictionaries with the Morfologik format. When doing this, we can try to improve the content of the dictionary (mostly by comparing different dictionaries, tagger vs spelling dictionary...), but it will be a laborious task.
When omitting the 3rd letter of "iniciais" thus "inciais", LT doesn't flag it as an error (pt-br).