udapi / udapi-python

Python framework for processing Universal Dependencies data
GNU General Public License v3.0
57 stars 31 forks source link

Portuguese Clitic vs Contractions #85

Closed arademaker closed 3 years ago

arademaker commented 3 years ago

https://github.com/udapi/udapi-python/blob/af9801167fbf99364b0cd2acb0382137f9c0afc3/udapi/block/ud/pt/addmwt.py#L42-L43

The no or nos, besides a contraction em+o it can also be a PRON no.

martinpopel commented 3 years ago

Thanks for reporting. I hope I fixed it. Feel free to make a pull request if you spot further problems.

arademaker commented 3 years ago

I am trying to deal with additional cases... but how can I say that

Além disso must be além+de+isso with

de -case> isso
isso -obl> além
martinpopel commented 3 years ago

I think your solution in #85 is OK, ie. 'disso': {'form': 'de isso', 'lemma': 'de isso', 'upos': 'ADP PRON', 'main': 1, 'shape': 'subtree', 'deprel': 'case *'} (The asterisk means that deprel of "isso" will be whatever was the original deprel of "disso".) Let me know if there are any further problems.