languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.47k stars 1.4k forks source link

[pt] Fix in Premium — 2024-11-06 #10994

Closed marcoagpinto closed 2 weeks ago

marcoagpinto commented 2 weeks ago

Heya, @jaumeortola , @susanaboatto and @p-goulart ,

I know you are very busy, but this is an issue that involves the Premium version of LanguageTool.

This disambiguator rule breaks the Premium version:

    <rule> <!-- Used ChatGPT 4o to verify the 3217 results -->
      <pattern>
        <token postag="V.+" postag_regexp="yes"><exception postag_regexp='yes' postag='CS|RG|NC.+|AQ.+|CC|SPS.+|[DP].+'/></token>
        <token regexp='yes' inflected='yes'>um|muito|pouco|vários|diversos|tal</token> <!-- Special PI.* that don't break rules -->
        <marker>
          <and>
            <token postag="VMIP3S0|VMM02S0|VMSP2S0|VMIP2S0|VMN02S0|VMSF2S0|VMIP1S0|VMP00SM" postag_regexp="yes"/>
            <token postag="NC.+" postag_regexp="yes"/>
          </and>
        </marker>
      </pattern>
      <disambig action="remove" postag="V.*"/>
    </rule>

Examples of its usage:

        <example correction=''>Quero muitos <marker>marines</marker>.</example>
        <example correction=''>Quero vários <marker>marines</marker>.</example>
        <example correction=''>Terminar tal <marker>curso</marker>.</example>
        <example correction=''>Terminar um <marker>doutorado</marker>.</example>
        <example correction=''>Vendeu pouco <marker>mercado</marker>.</example>
        <example correction=''>Vendeu muito <marker>mercado</marker>.</example>

Could one of you make changes in Premium so that I may implement it?

Thanks!

❤️ ❤️ ❤️

marcoagpinto commented 2 weeks ago

Here are the sentences it fixes:

1.txt

p-goulart commented 2 weeks ago

EUAI-32.

@marcoagpinto I will have a look. If it's a simple fix on the premium LT repo, I will implement it. Otherwise, the rule will need to be reverted.

marcoagpinto commented 2 weeks ago

@p-goulart

I reverted it already. 😛

But it would be great if we could have it since it is a great rule.

❤️ ❤️ ❤️

p-goulart commented 2 weeks ago

Can you point me to the CircleCI build where the premium build fails because of it?

marcoagpinto commented 2 weeks ago

@p-goulart

Is it this?: https://github.com/languagetool-org/languagetool/pull/10991

The URL you posted above, what shall I do? It asked to login or to create an account.

marcoagpinto commented 2 weeks ago

Or this: https://github.com/languagetool-org/languagetool/pull/10992 (the revert of the rule)

p-goulart commented 2 weeks ago

The URL you posted above, what shall I do?

Ticket for internal tracking, don't worry about it.

p-goulart commented 2 weeks ago

This is the failing pipeline, btw.

marcoagpinto commented 2 weeks ago

This is the failing pipeline, btw.

I have no access to it.

p-goulart commented 2 weeks ago
[/SENT_START*] Semana[semana/NCFS000*]  [ /null*] que[que/CS,que/PE0CN000,que/PR0CN000,que/PT0CN000]  [ /null*] vem[vir/VMIP3S0,vir/VMM02S0]  [ /null*] estarei[estar/VMIF1S0]  [ /null*] muito[muito/DI0MS0,muito/NCMS000,muito/PI0MS000,muito/RG]  [ /null*] ocupado[ocupado/AQ0MS0,ocupado/NCMS000] .[./SENT_END*,./_PUNCT_PERIOD*,./_PUNCT*]

The original rule required 'ocupado' to be a participle. IMO I don't think that's a bad thing.

I can add an alternative allowing it to be an adjective, but I do not think we should be removing the verb tag from obvious participles. Past participles are often used adjectivally, but that does not mean they are not also participles.

I recommend you amend the rule to be less aggressive.

Also note that your rule, for some reason, only removes the verb tag from the singular masculine participles, and it continues to work as expected for feminine (plural/singular) and masculine plural participles. Please pay closer attention to this kind of thing.

marcoagpinto commented 2 weeks ago
[/SENT_START*] Semana[semana/NCFS000*]  [ /null*] que[que/CS,que/PE0CN000,que/PR0CN000,que/PT0CN000]  [ /null*] vem[vir/VMIP3S0,vir/VMM02S0]  [ /null*] estarei[estar/VMIF1S0]  [ /null*] muito[muito/DI0MS0,muito/NCMS000,muito/PI0MS000,muito/RG]  [ /null*] ocupado[ocupado/AQ0MS0,ocupado/NCMS000] .[./SENT_END*,./_PUNCT_PERIOD*,./_PUNCT*]

The original rule required 'ocupado' to be a participle. IMO I don't think that's a bad thing.

I can add an alternative allowing it to be an adjective, but I do not think we should be removing the verb tag from obvious participles. Past participles are often used adjectivally, but that does not mean they are not also participles.

I recommend you amend the rule to be less aggressive.

Ahhhh... let me do it. It is very easy to do.

p-goulart commented 2 weeks ago

Okay, thanks! Do that and let me know. If you don't have access to the CircleCI pipelines for premium, I suppose I'll keep an eye on it.

(Do note, though, that it's not that trivial. I notice some of the sentences you have there include 'doutorado', which might be affected by removal of the participles.)

marcoagpinto commented 2 weeks ago

@p-goulart

https://github.com/languagetool-org/languagetool/pull/10996

It is fixed.

The rule removes around 3000 false positives.

It doesn't matter if it doesn't fix everything, we will gradually enhance and add more rules for that.

The important thing is that it didn't break anything and helps us create more accurate rules.