languagetool-org / portuguese-pos-dict

Portuguese POS tagger
GNU Lesser General Public License v2.1
5 stars 2 forks source link

Improve prefixation of FPs #29

Closed p-goulart closed 4 months ago

p-goulart commented 4 months ago

Some of these might be iffy. E.g., I'm pretty sure autoditada is a typo for autodidata. Autotune is also not a prefixation of tune, it is a full on loanword.

Autodescritos
reimigrando
reindicados
reintensificar
remissionem
repasseado
repressurizar
requeixo
Autoditada
Recopa
autocomplementar
autoconcebida
autodesignar
autoencontro
autoengano
autoimolaram
autoimolou
autoisentam
autolesionam
automanufatura
autoprovocada
autopunindo
autotune
reajeitou
reassinalados
recoletou
recomissionado
recongelado
redesenvolvida
redirigido
reencomendado
reestreou

Also these. Tokenisation issues following from lack of productive affixation in POS tagger as of dict v0.15. Once jogar, processar, etc. have the prefixes, the whole verb should be tokenised as expected.

rejogá
reprocessá
recertificá
reembalá
susanaboatto commented 4 months ago

We can probably leave autoditada and autotune out of the dictionary update

p-goulart commented 4 months ago

Not added: