languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.21k stars 1.38k forks source link

[pt] excepto / exceto #6680

Open jaumeortola opened 2 years ago

jaumeortola commented 2 years ago

In pt-PT, we recommend excepto>exceto. Should we do the same recommendation in pt-BR?

More generally, the replacement rule PT_AGREEMENT_REPLACE is active only for pt-PT. This is the list of words for this rule. https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/AOreplace.txt

ricardojosehlima commented 2 years ago

Hi @jaumeortola I think this is what I approached in the pull request I recently made. excepto > exceto should be maintained for pt-BR. Other suggestions as 'avançado-atacante' I removed for they produce false positives (see my PR post for explanations). This replace issue seems to apply to Portuguese people writing in Brazil , Brazilians themselves wouldn't write excepto, abstracto, and so on.

jaumeortola commented 2 years ago

Brazilians themselves wouldn't write excepto, abstracto, and so on.

I see. Anyway, we have to take care of possible errors. "Abstracto" is not even allowed by the pt-BR speller. But "excepto" is allowed by the pt-BR speller, and there is no rule (like the rule in pt-PT).

pt/AOreplace.txt is not the same file you edited. It seems to contain only spelling changes for "Acordo Ortográfico". I wonder if this replacement file could be applied to pt-BR, just in case the speller allows some of this words. I can check if there are more words like 'excepto' (accepted by the pt-BR speller, but in the list of pre-Acordo Ortográfico).

ricardojosehlima commented 2 years ago

Ok! As for this file there is some work to be done, at least for pt-BR, as I have already seen some wrong conversions: decepção remains decepção not deceção; same for detectar to detetar. I'm uncertain if these forms without 'c' are allowed in pt-PT.

jaumeortola commented 2 years ago

@ricardojosehlima Is there a way to do this systematically (not just manually)? What are the main references for Brazilian Portuguese? The Michaelis dictionary? Some other sources? The Michaelis seems to match your comments about decepção, detectar... Other dictionaries (from Portugal) have similar information, but it is not exactly the same.

ricardojosehlima commented 2 years ago

@jaumeortola Michaelis is a good source! There is also the VOLP (Vocabulário Ortográfico da Língua Portuguesa), elaborated by Academia Brasileira de Letras, but I haven't found a digital version. I use palavras.net to perform some searches, check this one on words with 'pç': https://www.palavras.net/search.php?i=&f=&ms=&mns=&m=p%C3%A7&mn=&fs=0&fs2=0&fnl=0&fnl2=0&fa=0&ju=0&d=18&tv=4&Submit=Pesquisa There you can choose between a global Portuguese source (+250k words) or a Brazilian source (+180k words)