languagetool-org / portuguese-pos-dict

Portuguese POS tagger
GNU Lesser General Public License v2.1
5 stars 2 forks source link

some verbal forms: passeais/passeai, passeamos/passeamo #2

Closed jaumeortola closed 8 months ago

jaumeortola commented 2 years ago

@ricardojosehlima @marcoagpinto See these verb forms:

passeai passear VMIP2P0
passeais passear VMIP2P0
passeamo passear VMIP1P0
passeamos passear VMIP1P0

The forms without 's' are not listed on dictionaries. I think they are used with enclitics pronouns. Is this correct? Are there other persons where the 's' is removed apart from first and second person plural?

marcoagpinto commented 2 years ago

I believe: passeamo passear VMIP1P0 isn't valid.

But there must dozens or hundreds of entries like that.

Adding manually to removed.txt would take a very long time.

jaumeortola commented 2 years ago

See https://portuguesaletra.com/duvidas/amamo-nos-ou-amamos-nos/

What about 'passeai'?

Don't worry about the number of verbs. I can adjust all of them quickly.

marcoagpinto commented 2 years ago

Yes, but how do we know what comes after the "passeamo" hyphen?

If we suggest a replacement using just the POS, generally it is an incorrect suggestion.

marcoagpinto commented 2 years ago

"passeai" sounds like a noble word or from the old people. "passeai o cão"

jaumeortola commented 2 years ago

Okay. The full answer I needed is here: https://www.migalhas.com.br/coluna/gramatigalhas/23518/verbo-seguido-de-pronome

Yes, but how do we know what comes after the "passeamo" hyphen?

Perhaps we could make the tagger more "intelligent". Do not tag these words unless they are followed by a hyphen and a appropriate pronoun.

marcoagpinto commented 2 years ago

Perhaps we could make the tagger more "intelligent". Do not tag these words unless they are followed by a hyphen and a appropriate pronoun.

Yes, but it will give a ton of hard work.

I have also been slowly working on this (do you remember the ticket in February?):

SUGGEST THE FIXES BELOW AFTER THE RELEASE OF 5.7.
FIX THE DAMN NAS/NOS, ETC.:
keep old POSes for legacy/compatibility.
2022-02-25+:

destes + destas

esta

estas

este

estes

das

dos

mas

pela

pelas

pelo

pelos

à > a+a > SPS00+* > SPS00:DA0FS0
à > a+as > SPS00+* > SPS00:DA0FP0

ao > a+o > SPS00+* > SPS00:DA0MS0
aos > a+os > SPS00+* > SPS00:DA0MP0

dela > de+ela > SPS00+PP3FS00 > SPS00:PP3FS00
delas > de+elas > SPS00+PP3FP00 > SPS00:PP3FP00

dele > de+ele > SPS00+PP3MS00 > SPS00:PP3MS00
deles > de+eles > SPS00+PP3MP00 > SPS00:PP3MP00

dessa > de+essas > SPS00+* > SPS00:DD0FS0:PD0FS000
* shouldn't it be: de+essa?
dessas > de+estas > SPS00+* > SPS00:DD0FP0:PD0FP000
* shouldn't it be: de+essas?

disso > de+isso > SPS00+PD0NN00 > SPS00:PD0NN00

na > em+a > SPS00+DA > SPS00:DA0FS0
nas > em+as > SPS00+DA > SPS00:DA0FP0

no > em+o > SPS00+DA > SPS00:DA0MS0
nos > em+os > SPS00+DA > SPS00:DA0MP0

nela > em+ela > SPS00+PP3FS00 > SPS00:PP3FS00
nelas > em+elas > SPS00+PP3FP00 > SPS00:PP3FP00

nele > em+ele > SPS00+PP3MS00 > SPS00:PP3MS00
neles > em+eles > SPS00+PP3MP00 > SPS00:PP3MP00

It is still WIP, it will take months to have it ready since I am writing down the entries as I use the words in the rules.

jaumeortola commented 2 years ago

I still haven't POS tags for fê-lo and similar forms. It is the same issue of amamo-nos and chamai-los.

ricardojosehlima commented 2 years ago

Hi @jaumeortola and @marcoagpinto I could join the discussion only now. The situation for passeamo is as Jaume found: it can only appear before hyphen passeamo-lo. Currently if I write Nós passeamo LT suggests in this order "passe amo, passemo, passamo, passeamos, passejamo". The first suggestion is maybe because "passe" and "amo" are valid words, but it doesn't make sense. All the others without an 's' in the ending look wrong to me. So the suggestion could be "Você quis escrever 'passeamos'? 'Passeamo' só é possível antes de um hífen (passeamo-lo)."

As for passeai it is as Marco said, the 2nd person plural imperative form, very much rarely used (in pt-br), but it must remain because of religious and poetical texts.

jaumeortola commented 2 years ago

As for passeai it is as Marco said, the 2nd person plural imperative form, very much rarely used (in pt-br), but it must remain because of religious and poetical texts.

I see the imperative form "passeai" in online dictionaries. That is fine. But in our old tagger dictionary we have "passeai, chamai..." as "presente indicativo 2 pl". I didn't know if that was similar to "amamo-nos". "chamais-los">"chamai-los"?

imatge

10) Cândido Jucá Filho, nesse aspecto, lembra que Filinto Elísio, "censurando os críticos ignorantes, incide nesta erronia grosseira: 'E chamais-los puristas e censores?' (isto é, 'chamai-los')".4 (from https://www.migalhas.com.br/coluna/gramatigalhas/23518/verbo-seguido-de-pronome)

So, the rule is "chamais-los">"chamai-los". Is this correct? This "chamai" is indicative present? Is this what you call "religious or poetical text"?

ricardojosehlima commented 2 years ago

In Portuguese it is "chamai-os", los only if the verb is in infinitive. This is one of the cases of religious or poetic. It can be present, you are correct. I don't recognize "chamai-los" as correct maybe an old form.

jaumeortola commented 2 years ago

Thank you for the answers. I think the solution for this modified verbal forms is to add a character in the POS tag.

These are some forms in our current tagger dictionary:

fi-la fi:la V:PP
fi-las fi:las V:PP
fi-lo fi:lo V:PP
fi-los fi:los V:PP
fá-la fá:la V:PP
fá-las fá:las V:PP
fá-lo fá:lo V:PP
fá-los fá:los V:PP
fê-la fê:la V:PP
fê-las fê:las V:PP
fê-lo fê:lo V:PP
fê-los fê:los V:PP
pu-la pus:la V:PP
pu-las pus:las V:PP
pu-lo pus:lo V:PP
pu-los pus:los V:PP
qui-la quis:la V:PP
qui-las quis:las V:PP
qui-lo quis:lo V:PP
qui-los quis:los V:PP

They come from different verbs (tense, person, number):

fi -> fiz fazer VMIS1S0
fa -> faz fazer VMIP3S0, faz fazer VMM02S0
fê -> fez fazer VMIS3S0
pu -> pus pôr VMIS1S0
qui -> quis querer VMIS1S0, quis querer VMIS3S0

Is that correct? That means that any verbal form ending with r, s, z has to be adapted when it is followed by pronouns. That only makes sense when the pronoun is appropriate.

Where can I find complete documentation about this? Verbs with enclitics and mesoclitics, correct combinations of verbs and pronouns, and changes in the verbal form. @ricardojosehlima

ricardojosehlima commented 2 years ago

Hi @jaumeortola you can find a large list of irregular verbs here: https://www.todamateria.com.br/verbos-irregulares-em-portugues/

I scanned it and found some verbs with the pattern you are looking for (there are others but they looked extremely rare):

dizer di-lo 
fazer fá-lo
trazer trá-lo
compor compu-lo
contrapor contrapu-lo
decompor decompu-lo
depor depu-lo
dispor dispu-lo
expor expu-lo
impor impu-lo
indispor indispu-lo
justapor justapu-lo
opor opu-lo
predispor predispu-lo
pressupor pressupu-lo
propor propu-lo
recompor recompu-lo
repor repu-lo
sobrepor sobrepu-lo
supor supu-lo

the first three verbs are in present, 3rd person, singular. The others are in past, 1st person, singular.

jaumeortola commented 2 years ago

Is there a reduced form for "ir-lo" -> "i-lo"?

ricardojosehlima commented 2 years ago

Hi, no, for the verb ir in Portuguese does not require a direct object, so neither form is acceptable.

jaumeortola commented 2 years ago

Thanks. I could imagine that. But all forms are generated automatically by the scripts.

jaumeortola commented 2 years ago

@ricardojosehlima Now, where can I find documentation about all possible pronoun combinations (enclitics and mesoclitics)? I could buy some advanced grammar book if necessary. In these files, you can see the combinations accepted by our current PT and BR spellcheckers for the verb 'amar'. There are many more forms in pt-BR. Are they correct? Do they make sense? Do the differences between PT and BR make sense? amar-br.txt amar-pt.txt

ricardojosehlima commented 2 years ago

@jaumeortola you could look for Bechara or Cunha's grammars, they are very complete. As for the files you sent: 1-) The rule for mesoclisis in Portuguese (both pt and br) is insertion of the pronoun between the root and the inflection of the verb only if the verb is in the future or "futuro do pretérito". 2-) In the br file there is "amaríamos-te" which is in disagreement with (1). 3-) The br file has many prefixes added to 'amar', the pt file has others. The only one which is current is 'des' and maybe 're' could be used. All the rest seems unattested to me. 4-) The pt file brings all the correct combinations according to (1); the br file brings some combinations that don't make sense. 5-) The br file has the combination 'se-lhe' which is extremely rare to find in br, and no one uses it anymore. Anyway, 'se-lhe' doesn't apply to any verb and must have a context of use and it is not the case of the verb 'amar'.

p-goulart commented 8 months ago

This issue will be fixed once we finish the transition to both .aff and Pos tagger scripts outputting all enclitic verb forms as single tokens. Closing.