languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.31k stars 1.39k forks source link

[pt] Ter corrected for ter after dash #6657

Open ricardojosehlima opened 2 years ago

ricardojosehlima commented 2 years ago

All is fine when Ter starts a sentece: Ter filhos é um desafio.

But when a dash precedes it:

LT sees this as a weekday abbreviation and suggests that should be 'ter'.

Maybe there are more cases with dash or something like this?

marcoagpinto commented 2 years ago

Sure, I will fix it at 5am 🙂

I have fixed minutes ago the diacritics rule: https://github.com/languagetool-org/languagetool/commit/280d1c28232edd82ca63ec8d775a1a30b725b354

and now I am too stressed.

I wonder if the diacritics rule can be used to replace all the paronym ones.

ricardojosehlima commented 2 years ago

Great. I would like to see a file with hits on this diacritcs paronym rule.

marcoagpinto commented 2 years ago

@ricardojosehlima

I placed the 0before and the last after .txts in the commit comment minutes ago.

I was looking at it and spotted some false positives.

ricardojosehlima commented 2 years ago

Hi @marcoagpinto where can I see the file?

marcoagpinto commented 2 years ago

https://github.com/languagetool-org/languagetool/commit/280d1c28232edd82ca63ec8d775a1a30b725b354

It is in the commit message at the bottom.

ricardojosehlima commented 2 years ago

Oh sorry I was whinking it was about the Ter, I will see it soon

marcoagpinto commented 2 years ago

@ricardojosehlima

I am right now generating the results in 600 000 sentences to see what will come out.

marcoagpinto commented 2 years ago

@ricardojosehlima

It creates some false positives with "dom" and "ter":

Os principais cuidados com seu pet Ter um cachorrinho em casa para te fazer companhia é muito bom!
- Ter cuidado especial ao escolher sua senha, evitando senhas óbvias tais como seu nome, seu e-mail, sobr...
· Ter desenvolvido e implantado projetos na tecnologia Sharepoint.
⦁ Dedicação: Ter vontade de servir, ser prestável e estabelecer uma relação de empatia.
– Ter um momento em família, com todos reunidos, além das refeições.
Confessava Ter sido sempre bastante orgulhoso dos haveres herdados e sobretudo daquele filho tão perfeito...Vinte e três Plantas Pra Ter Dentro De Apartamento Com Muito Charme!
Aprimorar a velocidade do Como Fazer Seu Website De Vendas Online Ter Sucesso - Byte A Byte a alegria do usuário (acesse o meu primeiro desse post), mas isto também poss...

Diziam: “Sobre Dom Helder, nem a morte da mãe”.
Poderoso cavaleiro é Dom Dinheiro.
PA 112 (Rodovia Dom Eliseu), Km 17 na região do Montenegro e envolveu o carro da Tv Mania Bragança (afiliada a Rede Rec...
Segundo lembro, os sacerdotes contra a ditadura eram o Paulo Evaristo Arns, Dom Elder Câmara e Pedro Casaldaliga.
Após a ordenação, ele será empossado bispo no lugar de Dom Frei José Luiz Azcona.
Poderoso cavaleiro é Dom Dinheiro.
... dois anos permaneceu estudando em Paris, regressando ao Brasil, em 1876, quando o bispo de Olinda, Dom Vital Maria Gonçalves de Oliveira, o incumbiu de estrutura o Seminário, no qual foi professor de Fi...
Em 1734, Dom Frei Antônio de Guadalupe ergue, então, uma capela dedicada a Santo Antônio.

Are you certain that the rule you referred to isn't the suggesting lowercase for weekdays (AO90_WEEKDAYS_CASING)?

The "dom" fix can be done by checking for a proper name after it, but it won't remove "Dom Dinheiro".

The "ter" one is harder, I could search for a verb past participle after it, but there are other postags for it. What do you suggest from the hits above?

Thanks!

marcoagpinto commented 2 years ago

Hello @ricardojosehlima

I have just fixed it!

Notice that the "Dom" + Proper Names just gave false positives because the names weren't tagged: https://github.com/languagetool-org/languagetool/commit/da6507df95c5a0368de1779ed1707fde54ff71f3

There was the same rule for both pt-PT and pt-BR, so I renamed the ID and added an antipattern for "Ter".

0before.txt

4.txt

Since the rules seem to be identical, I tested only with the pt-PT, but I noticed that it had two extra antipatterns.

ricardojosehlima commented 2 years ago

Hi, great! There remain some false positives when in names of programs (Domingo Legal) or special dates (Sexta-feira Santa) or some unkown name (Dom Helder). But as for what started the thread, it seems it remains unsolved:

A tríade é simples Ser > Fazer > Ter.

This is still receiving the suggestion that Ter is the weekday and should be ter. Can't the rule only apply to the full version of the weekday? Terça-feira > terça-feira.

marcoagpinto commented 2 years ago

@ricardojosehlima

This rule is made by two rules, one for abbreviations of weekdays and the other for the full weekdays.

I will look at it soon.

ricardojosehlima commented 2 years ago

Ah, right. So my suggestion is to deactivate the rule only for Ter and only if it appears after some special caracter as --> or > or - because there are high chances that it will be the verb 'ter' not the weekday,

ricardojosehlima commented 2 years ago

@marcoagpinto I edited the comment, ok?

marcoagpinto commented 2 years ago

@ricardojosehlima

I was doing some tests and the new antipattern didn't generate any new hits, so I opened the standalone tool and pasted: - Ter filhos é um desafio.

It doesn't appear as an error.

marcoagpinto commented 2 years ago
      <antipattern>
          <token postag='SENT_START' postag_regexp='no'/>
          <token postag='_PUNCT' postag_regexp='no'/>
          <token>Ter</token>          
      </antipattern>

It generates no hits, and the sentence above isn't affected by it.

ricardojosehlima commented 2 years ago

@ricardojosehlima

I was doing some tests and the new antipattern didn't generate any new hits, so I opened the standalone tool and pasted: - Ter filhos é um desafio.

It doesn't appear as an error.

Strange... Look at the attached image. Tested when writing an e-mail (Firefox, PC). Captura de tela de 2022-05-17 14-13-40

marcoagpinto commented 2 years ago

Strange... Look at the attached image. Tested when writing an e-mail (Firefox, PC). Captura de tela de 2022-05-17 14-13-40

Yes, because I committed it today, and it only comes in the nightly.

And the browser/thunderbird add-on is only updated every one or two weeks for Premium accounts.

Not sure how much time for normal accounts, but it should be days or so.

marcoagpinto commented 2 years ago

I am right now fixing the unknown names.

I have also found a way of fixing the "Domingo Legal" and "Sexta-feira Santa": I will add them to multiwords.txt as proper nouns.

Notice that the changes made at this hour will only work in tomorrow's nightly, not tonight's.

ricardojosehlima commented 2 years ago

Great! Thanks for explaining about the versions and how long they take to be effective.

marcoagpinto commented 2 years ago

[pt] Fix for "Domingo Legal" and "Sexta-feira Santa" in multiwords.txt: https://github.com/languagetool-org/languagetool/commit/82ed6b1195ce97228164d1324e357fe4c18b86ea

And now the fix for unknown names/words: https://github.com/languagetool-org/languagetool/commit/48606d8cf3d24b34dd733c53b4b3986b87a3bc8c

https://github.com/languagetool-org/languagetool/commit/096f39db6c796518f26eb84a80cc98a9cbe5f677

and the test against 600 000 sentences:

7.txt

ricardojosehlima commented 2 years ago

Great!