languagetool-org / portuguese-pos-dict

Portuguese POS tagger
GNU Lesser General Public License v2.1
5 stars 2 forks source link

[pt-PT] Added rules to .AFFs taken from -BR to -PT 45+90 #23

Closed marcoagpinto closed 6 months ago

marcoagpinto commented 6 months ago

Heya, @susanaboatto and @p-goulart

Here is an update of the pt-PT .AFFs “stolen” from pt-BR.

They added around 8 000 conjugations.

6.PTPT_90_new_verbs.txt

marcoagpinto commented 6 months ago

@p-goulart

Ahhhh… thanks for the feedback.

Can the “ter” issues be solved if I add [aeiou]ter to ter?

Thanks!

p-goulart commented 6 months ago

Can the “ter” issues be solved if I add [aeiou]ter to ter?

That won't account for all forms: conter, obter.

Trusting the Brazilian affix files too much is not wise since the flags are used in fundamentally different ways, with different distributions. It's best to just adapt the pt-PT flags.

marcoagpinto commented 6 months ago

Ahhhh… I will fix it tonight… or tomorrow at 5am… right now, I have been up since 3am working on other projects and my brain is drained, and I can't focus on precision things😄 😛 😛 😛 😛 😛 😛 😛 😛 😛 😛 😛

marcoagpinto commented 6 months ago

Heya, @p-goulart

I have done some enhancements to the .AFFs.

Here are the results of the new AO90 one (new words): 6.PTPT_90_new_verbs_20240216.txt

I have been unable to check conjugation by conjugation, so, please, if you find any wrong item, please tell me with a suggestion on how to fix them.

Thanks!

marcoagpinto commented 6 months ago

@p-goulart

Heya, Pedro, any news regarding this?

Thanks!

p-goulart commented 6 months ago

We can still spot some verb forms that don't make a lot of sense:

We know it's an unexciting task, but do you think you could check your work in full yourself and spot the odd ones out?


The approach you're taking here is not working because the flags are used differently in the pt-PT and pt-BR files. Please bear that in mind.

marcoagpinto commented 6 months ago

We can still spot some verb forms that don't make a lot of sense:

* abatém;

* batenha;

* combativesse;

* cometeve.

We know it's an unexciting task, but do you think you could check your work in full yourself and spot the odd ones out?

The approach you're taking here is not working because the flags are used differently in the pt-PT and pt-BR files. Please bear that in mind.

Ahhhh… thanks for the feedback, I will close this pull request and start with smaller rules… I was too greedy and wanted the bullet that cures all, or whatever it is called.

I always liked to make things bit by bit, but I was too greedy.

😋 😋 😋 😋 😋 😋 😋 😋 😋 😋 😋