languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.43k stars 1.4k forks source link

[pt] Fix in Premium — 2024-11-06 #10994

Closed marcoagpinto closed 6 days ago

marcoagpinto commented 6 days ago

Heya, @jaumeortola , @susanaboatto and @p-goulart ,

I know you are very busy, but this is an issue that involves the Premium version of LanguageTool.

This disambiguator rule breaks the Premium version:

    <rule> <!-- Used ChatGPT 4o to verify the 3217 results -->
      <pattern>
        <token postag="V.+" postag_regexp="yes"><exception postag_regexp='yes' postag='CS|RG|NC.+|AQ.+|CC|SPS.+|[DP].+'/></token>
        <token regexp='yes' inflected='yes'>um|muito|pouco|vários|diversos|tal</token> <!-- Special PI.* that don't break rules -->
        <marker>
          <and>
            <token postag="VMIP3S0|VMM02S0|VMSP2S0|VMIP2S0|VMN02S0|VMSF2S0|VMIP1S0|VMP00SM" postag_regexp="yes"/>
            <token postag="NC.+" postag_regexp="yes"/>
          </and>
        </marker>
      </pattern>
      <disambig action="remove" postag="V.*"/>
    </rule>

Examples of its usage:

        <example correction=''>Quero muitos <marker>marines</marker>.</example>
        <example correction=''>Quero vários <marker>marines</marker>.</example>
        <example correction=''>Terminar tal <marker>curso</marker>.</example>
        <example correction=''>Terminar um <marker>doutorado</marker>.</example>
        <example correction=''>Vendeu pouco <marker>mercado</marker>.</example>
        <example correction=''>Vendeu muito <marker>mercado</marker>.</example>

Could one of you make changes in Premium so that I may implement it?

Thanks!

❤️ ❤️ ❤️

marcoagpinto commented 6 days ago

Here are the sentences it fixes:

1.txt

p-goulart commented 6 days ago

EUAI-32.

@marcoagpinto I will have a look. If it's a simple fix on the premium LT repo, I will implement it. Otherwise, the rule will need to be reverted.

marcoagpinto commented 6 days ago

@p-goulart

I reverted it already. 😛

But it would be great if we could have it since it is a great rule.

❤️ ❤️ ❤️

p-goulart commented 6 days ago

Can you point me to the CircleCI build where the premium build fails because of it?

marcoagpinto commented 6 days ago

@p-goulart

Is it this?: https://github.com/languagetool-org/languagetool/pull/10991

The URL you posted above, what shall I do? It asked to login or to create an account.

marcoagpinto commented 6 days ago

Or this: https://github.com/languagetool-org/languagetool/pull/10992 (the revert of the rule)

p-goulart commented 6 days ago

The URL you posted above, what shall I do?

Ticket for internal tracking, don't worry about it.

p-goulart commented 6 days ago

This is the failing pipeline, btw.

marcoagpinto commented 6 days ago

This is the failing pipeline, btw.

I have no access to it.

p-goulart commented 6 days ago
[/SENT_START*] Semana[semana/NCFS000*]  [ /null*] que[que/CS,que/PE0CN000,que/PR0CN000,que/PT0CN000]  [ /null*] vem[vir/VMIP3S0,vir/VMM02S0]  [ /null*] estarei[estar/VMIF1S0]  [ /null*] muito[muito/DI0MS0,muito/NCMS000,muito/PI0MS000,muito/RG]  [ /null*] ocupado[ocupado/AQ0MS0,ocupado/NCMS000] .[./SENT_END*,./_PUNCT_PERIOD*,./_PUNCT*]

The original rule required 'ocupado' to be a participle. IMO I don't think that's a bad thing.

I can add an alternative allowing it to be an adjective, but I do not think we should be removing the verb tag from obvious participles. Past participles are often used adjectivally, but that does not mean they are not also participles.

I recommend you amend the rule to be less aggressive.

Also note that your rule, for some reason, only removes the verb tag from the singular masculine participles, and it continues to work as expected for feminine (plural/singular) and masculine plural participles. Please pay closer attention to this kind of thing.

marcoagpinto commented 6 days ago
[/SENT_START*] Semana[semana/NCFS000*]  [ /null*] que[que/CS,que/PE0CN000,que/PR0CN000,que/PT0CN000]  [ /null*] vem[vir/VMIP3S0,vir/VMM02S0]  [ /null*] estarei[estar/VMIF1S0]  [ /null*] muito[muito/DI0MS0,muito/NCMS000,muito/PI0MS000,muito/RG]  [ /null*] ocupado[ocupado/AQ0MS0,ocupado/NCMS000] .[./SENT_END*,./_PUNCT_PERIOD*,./_PUNCT*]

The original rule required 'ocupado' to be a participle. IMO I don't think that's a bad thing.

I can add an alternative allowing it to be an adjective, but I do not think we should be removing the verb tag from obvious participles. Past participles are often used adjectivally, but that does not mean they are not also participles.

I recommend you amend the rule to be less aggressive.

Ahhhh... let me do it. It is very easy to do.

p-goulart commented 6 days ago

Okay, thanks! Do that and let me know. If you don't have access to the CircleCI pipelines for premium, I suppose I'll keep an eye on it.

(Do note, though, that it's not that trivial. I notice some of the sentences you have there include 'doutorado', which might be affected by removal of the participles.)

marcoagpinto commented 6 days ago

@p-goulart

https://github.com/languagetool-org/languagetool/pull/10996

It is fixed.

The rule removes around 3000 false positives.

It doesn't matter if it doesn't fix everything, we will gradually enhance and add more rules for that.

The important thing is that it didn't break anything and helps us create more accurate rules.