languagetool-org / portuguese-pos-dict

Portuguese POS tagger
GNU Lesser General Public License v2.1
5 stars 2 forks source link

Words invalid for PT? #7

Closed marcoagpinto closed 8 months ago

marcoagpinto commented 2 years ago

@jaumeortola

I don't know how to tag Ricardo Joseh Lima in this post.

The words: dize faze

appear in the tagger dictionary, but I believe they don't exist.

jaumeortola commented 2 years ago

@ricardojosehlima

These words are tagged as imperatives, and can be seen here: http://www.portaldalinguaportuguesa.org/simplesearch.php?action=lemma&lemma=13237

Perhaps they are used only with pronouns (dize-me)? See: https://context.reverso.net/traduccion/portugues-espanol/dize-me

@marcoagpinto Are these forms undesired results in some rule? Which rule and which sentences?

marcoagpinto commented 2 years ago

@jaumeortola

I have been trying to improve the informal rule, but it gives tons of false positives:

<!-- MARCOAGPINTO 2022-06-03 (24-MAY-2022+) *START* -->
<!--

-->
      <antipattern>
        <token postag='V.+' postag_regexp='yes'>      
            <exception postag_regexp='yes' postag='CS|RN|VMSP2S0'/>
        </token>
        <token postag='SPS.+|RG' postag_regexp='yes'>     
            <exception postag_regexp='yes' postag='CS|CC'/>
        </token>
        <token postag='VM[IS]P2S0' postag_regexp='yes'/>
      </antipattern>
<!-- MARCOAGPINTO 2022-06-03 (24-MAY-2022+) *END* -->

After 3 or 4 hours processing 600 000 sentences, I am going to relax a bit.

The rule has 70 000 hits and every change I make in the antipattern removes hundreds or thousands of hits, I have to check one by one in the diff.

Tomorrow I will continue the antipattern.

marcoagpinto commented 2 years ago

"dize-me" doesn't exist in pt-PT.

marcoagpinto commented 2 years ago

Ricardo: does "dize-me" make any sense?

At least, I have never heard it here.

ricardojosehlima commented 2 years ago

Hi @jaumeortola and @marcoagpinto both dize and faze are legitimate in pt, they are the imperative forms of the 2nd person singular. You'll find them in biblical texts, for example. In everyday register, in pt-BR none is used.

marcoagpinto commented 2 years ago

ahhhhh... thanks, @ricardojosehlima

This only proves that I don't read biblical textsโ€ฆ ๐Ÿ™‚ ๐Ÿ™‚ ๐Ÿ™‚

p-goulart commented 8 months ago

I'm closing this. Those are valid, albeit rare, forms.