languagetool-org / portuguese-pos-dict

Portuguese POS tagger
GNU Lesser General Public License v2.1
5 stars 2 forks source link

[pt-PT] Improved .AFF files for both AO45 and AO90 #15

Closed marcoagpinto closed 8 months ago

marcoagpinto commented 8 months ago

Heya, @susanaboatto and @p-goulart

I hope to have committed to the right branch this time.

This commit adds verbal forms such as:

ame-a
ame-as
ame-lhe
ame-lhes
ame-me
ame-nos
ame-o
ame-os
ame-se
ame-se-lhe
ame-se-lhes
ame-te
ame-vos

But in AO90 it added 58465 words, could you give a quick look to check if everything is fine? It is crazy to have so many new words since the dictionaries are maintained by Minho University for decades and from one moment to the other it raises 58 thousand words?

AO45: 3.PTPT_45_new_verbs.txt

AO90: 6.PTPT_90_new_verbs.txt

πŸ˜› πŸ˜› πŸ˜› πŸ˜› πŸ˜› πŸ˜› πŸ˜› πŸ˜› πŸ˜›

Thanks!

marcoagpinto commented 8 months ago

Ahhhh… like the last one, these rules were copied from the pt-BR dictionary, so they work exactly like they do there.

p-goulart commented 8 months ago

The number rings true to me. Since we are adding pronouns in combination with endings, the increase is exponential. To me his looks okay.

marcoagpinto commented 8 months ago

@p-goulart

Thanks!

I have now merged it, but notice that β€œall checks failed”.

p-goulart commented 8 months ago

These tests will be failing for a while on this branch, until we fix all the issues with the XML rules as well as pt-PT tagging.