languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.43k stars 1.4k forks source link

[pt] LT fails to correct "inciais" #7339

Closed ricardojosehlima closed 2 years ago

ricardojosehlima commented 2 years ago

When omitting the 3rd letter of "iniciais" thus "inciais", LT doesn't flag it as an error (pt-br).

marcoagpinto commented 2 years ago

@jaumeortola @susanaboatto

It is the second time today (and yesterday?) that someone complains about typos not being showing as so.

Is there an issue with the speller checker?

Thanks!

danielnaber commented 2 years ago

There haven't been any changes recently. The hunspell dictionary we use simply seems to accept that word for pt-BR:

echo "inciais" | hunspell -a -d languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/hunspell/pt_BR -> no error found

echo "inciais" | hunspell -a -d languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/hunspell/pt_PT -> error found (same command but PT instead of BR)

"inciais" itself isn't in the pt_BR.dic file, so it must be generated via a flag.

ricardojosehlima commented 2 years ago

@danielnaber this should mean that "inciais" is in the pt-BR dictionary located in this folder? I just downloaded it, searched for "inciais" and found nothing. On the other hand, in the neighborhood, I found "incetivo" (should be "incentivo") and "inciativa" (should be "iniciativa").

danielnaber commented 2 years ago

this should mean that "inciais" is in the pt-BR dictionary located in this folder?

Not necessarily directly. It can be generated from a different word by the flags (the characters after the /).

jaumeortola commented 2 years ago

The problem happens with the wrong verb *inciar and all its inflected forms.

ricardojosehlima commented 2 years ago

But this verb is not at hunspell (I didn't find it there), right?

jaumeortola commented 2 years ago

But this verb is not at hunspell (I didn't find it there), right?

Yes. It is very strange. "Inciar" is not in our hunspell pt_BR.dic file. But I can see "inciar" in a newer pt_BR Hunspell version.

Now I found the origin of the error. In our dictionary, "inciais" comes from "ciar" with prefix in- and inflection:

$ hunspell -d pt_BR
Hunspell 1.7.0
inciar
+ ciar

inciais
+ ciar

I talked about the potential problems with prefixes here: https://github.com/languagetool-org/languagetool/issues/6079 The dictionaries need a thorough review...

ricardojosehlima commented 2 years ago

Ah ok, the verb "ciar" whatever it means is very rare in pt-br, can be excluded from the dictionary. As for "in-" as a prefix, I think it only applies to adjectives ("in+feliz", "in+sensível") and adverbs formed from these adjectives ("in+felizmente", "in+sensivelmente"), not to verbs, so the possibility of in + verbs could also be excluded, if possible.

jaumeortola commented 2 years ago

Okay, I removed the verb 'ciar' (I did the same in Catalan with this verb some time ago).

This quarter, we plan to rebuild the Portuguese spelling dictionaries with the Morfologik format. When doing this, we can try to improve the content of the dictionary (mostly by comparing different dictionaries, tagger vs spelling dictionary...), but it will be a laborious task.