Closed jaumeortola closed 8 months ago
I believe: passeamo passear VMIP1P0 isn't valid.
But there must dozens or hundreds of entries like that.
Adding manually to removed.txt would take a very long time.
See https://portuguesaletra.com/duvidas/amamo-nos-ou-amamos-nos/
What about 'passeai'?
Don't worry about the number of verbs. I can adjust all of them quickly.
Yes, but how do we know what comes after the "passeamo" hyphen?
If we suggest a replacement using just the POS, generally it is an incorrect suggestion.
"passeai" sounds like a noble word or from the old people. "passeai o cão"
Okay. The full answer I needed is here: https://www.migalhas.com.br/coluna/gramatigalhas/23518/verbo-seguido-de-pronome
Yes, but how do we know what comes after the "passeamo" hyphen?
Perhaps we could make the tagger more "intelligent". Do not tag these words unless they are followed by a hyphen and a appropriate pronoun.
Perhaps we could make the tagger more "intelligent". Do not tag these words unless they are followed by a hyphen and a appropriate pronoun.
Yes, but it will give a ton of hard work.
I have also been slowly working on this (do you remember the ticket in February?):
SUGGEST THE FIXES BELOW AFTER THE RELEASE OF 5.7.
FIX THE DAMN NAS/NOS, ETC.:
keep old POSes for legacy/compatibility.
2022-02-25+:
destes + destas
esta
estas
este
estes
das
dos
mas
pela
pelas
pelo
pelos
à > a+a > SPS00+* > SPS00:DA0FS0
à > a+as > SPS00+* > SPS00:DA0FP0
ao > a+o > SPS00+* > SPS00:DA0MS0
aos > a+os > SPS00+* > SPS00:DA0MP0
dela > de+ela > SPS00+PP3FS00 > SPS00:PP3FS00
delas > de+elas > SPS00+PP3FP00 > SPS00:PP3FP00
dele > de+ele > SPS00+PP3MS00 > SPS00:PP3MS00
deles > de+eles > SPS00+PP3MP00 > SPS00:PP3MP00
dessa > de+essas > SPS00+* > SPS00:DD0FS0:PD0FS000
* shouldn't it be: de+essa?
dessas > de+estas > SPS00+* > SPS00:DD0FP0:PD0FP000
* shouldn't it be: de+essas?
disso > de+isso > SPS00+PD0NN00 > SPS00:PD0NN00
na > em+a > SPS00+DA > SPS00:DA0FS0
nas > em+as > SPS00+DA > SPS00:DA0FP0
no > em+o > SPS00+DA > SPS00:DA0MS0
nos > em+os > SPS00+DA > SPS00:DA0MP0
nela > em+ela > SPS00+PP3FS00 > SPS00:PP3FS00
nelas > em+elas > SPS00+PP3FP00 > SPS00:PP3FP00
nele > em+ele > SPS00+PP3MS00 > SPS00:PP3MS00
neles > em+eles > SPS00+PP3MP00 > SPS00:PP3MP00
It is still WIP, it will take months to have it ready since I am writing down the entries as I use the words in the rules.
I still haven't POS tags for fê-lo
and similar forms. It is the same issue of amamo-nos
and chamai-los
.
Hi @jaumeortola and @marcoagpinto I could join the discussion only now. The situation for passeamo is as Jaume found: it can only appear before hyphen passeamo-lo. Currently if I write Nós passeamo LT suggests in this order "passe amo, passemo, passamo, passeamos, passejamo". The first suggestion is maybe because "passe" and "amo" are valid words, but it doesn't make sense. All the others without an 's' in the ending look wrong to me. So the suggestion could be "Você quis escrever 'passeamos'? 'Passeamo' só é possível antes de um hífen (passeamo-lo)."
As for passeai it is as Marco said, the 2nd person plural imperative form, very much rarely used (in pt-br), but it must remain because of religious and poetical texts.
As for passeai it is as Marco said, the 2nd person plural imperative form, very much rarely used (in pt-br), but it must remain because of religious and poetical texts.
I see the imperative form "passeai" in online dictionaries. That is fine. But in our old tagger dictionary we have "passeai, chamai..." as "presente indicativo 2 pl". I didn't know if that was similar to "amamo-nos". "chamais-los">"chamai-los"?
10) Cândido Jucá Filho, nesse aspecto, lembra que Filinto Elísio, "censurando os críticos ignorantes, incide nesta erronia grosseira: 'E chamais-los puristas e censores?' (isto é, 'chamai-los')".4
(from https://www.migalhas.com.br/coluna/gramatigalhas/23518/verbo-seguido-de-pronome)
So, the rule is "chamais-los">"chamai-los". Is this correct? This "chamai" is indicative present? Is this what you call "religious or poetical text"?
In Portuguese it is "chamai-os", los only if the verb is in infinitive. This is one of the cases of religious or poetic. It can be present, you are correct. I don't recognize "chamai-los" as correct maybe an old form.
Thank you for the answers. I think the solution for this modified verbal forms is to add a character in the POS tag.
These are some forms in our current tagger dictionary:
fi-la fi:la V:PP
fi-las fi:las V:PP
fi-lo fi:lo V:PP
fi-los fi:los V:PP
fá-la fá:la V:PP
fá-las fá:las V:PP
fá-lo fá:lo V:PP
fá-los fá:los V:PP
fê-la fê:la V:PP
fê-las fê:las V:PP
fê-lo fê:lo V:PP
fê-los fê:los V:PP
pu-la pus:la V:PP
pu-las pus:las V:PP
pu-lo pus:lo V:PP
pu-los pus:los V:PP
qui-la quis:la V:PP
qui-las quis:las V:PP
qui-lo quis:lo V:PP
qui-los quis:los V:PP
They come from different verbs (tense, person, number):
fi -> fiz fazer VMIS1S0
fa -> faz fazer VMIP3S0, faz fazer VMM02S0
fê -> fez fazer VMIS3S0
pu -> pus pôr VMIS1S0
qui -> quis querer VMIS1S0, quis querer VMIS3S0
Is that correct? That means that any verbal form ending with r, s, z has to be adapted when it is followed by pronouns. That only makes sense when the pronoun is appropriate.
Where can I find complete documentation about this? Verbs with enclitics and mesoclitics, correct combinations of verbs and pronouns, and changes in the verbal form. @ricardojosehlima
Hi @jaumeortola you can find a large list of irregular verbs here: https://www.todamateria.com.br/verbos-irregulares-em-portugues/
I scanned it and found some verbs with the pattern you are looking for (there are others but they looked extremely rare):
dizer di-lo
fazer fá-lo
trazer trá-lo
compor compu-lo
contrapor contrapu-lo
decompor decompu-lo
depor depu-lo
dispor dispu-lo
expor expu-lo
impor impu-lo
indispor indispu-lo
justapor justapu-lo
opor opu-lo
predispor predispu-lo
pressupor pressupu-lo
propor propu-lo
recompor recompu-lo
repor repu-lo
sobrepor sobrepu-lo
supor supu-lo
the first three verbs are in present, 3rd person, singular. The others are in past, 1st person, singular.
Is there a reduced form for "ir-lo" -> "i-lo"?
Hi, no, for the verb ir in Portuguese does not require a direct object, so neither form is acceptable.
Thanks. I could imagine that. But all forms are generated automatically by the scripts.
@ricardojosehlima Now, where can I find documentation about all possible pronoun combinations (enclitics and mesoclitics)? I could buy some advanced grammar book if necessary. In these files, you can see the combinations accepted by our current PT and BR spellcheckers for the verb 'amar'. There are many more forms in pt-BR. Are they correct? Do they make sense? Do the differences between PT and BR make sense? amar-br.txt amar-pt.txt
@jaumeortola you could look for Bechara or Cunha's grammars, they are very complete. As for the files you sent: 1-) The rule for mesoclisis in Portuguese (both pt and br) is insertion of the pronoun between the root and the inflection of the verb only if the verb is in the future or "futuro do pretérito". 2-) In the br file there is "amaríamos-te" which is in disagreement with (1). 3-) The br file has many prefixes added to 'amar', the pt file has others. The only one which is current is 'des' and maybe 're' could be used. All the rest seems unattested to me. 4-) The pt file brings all the correct combinations according to (1); the br file brings some combinations that don't make sense. 5-) The br file has the combination 'se-lhe' which is extremely rare to find in br, and no one uses it anymore. Anyway, 'se-lhe' doesn't apply to any verb and must have a context of use and it is not the case of the verb 'amar'.
This issue will be fixed once we finish the transition to both .aff
and Pos tagger scripts outputting all enclitic verb forms as single tokens. Closing.
@ricardojosehlima @marcoagpinto See these verb forms:
The forms without 's' are not listed on dictionaries. I think they are used with enclitics pronouns. Is this correct? Are there other persons where the 's' is removed apart from first and second person plural?