Open dix83 opened 1 week ago
Hello, TEPROLIN does not output the "_upos" field. POS tagging fields are "_ctg" (you can use this one instead of _upos, although it does not contain the UPOS tags for Romanian) and "_msd" which is the detailed version of "_ctg" with many more morphological attributes.
To see how "_ctg" and "_msd" map to UPOS, you can check out the Romanian UD corpus here: https://github.com/UniversalDependencies/UD_Romanian-RRT, file ro_rrt-ud-train.conllu.
If you use the UDPipe NLP app of TEPROLIN (TTL is the default one), I think the '_ctg' field contains the actual UPOS of the token. But you have to install UDPipe first, as instructed in the README.
The result is: { "_bner": "", "_chunk": "Np#1", "chunk_det": "Ion", "_ctg": "NP", "_deprel": "nsubj", "_expand": "", "_head": 4, "_id": 1, "_lemma": "Ion", "_msd": "Np", "_ner": "PER", "_ner_2": "", "_phon": "", "_syll": "", "_wordform": "Ion", "_upos": "" }, { "_bner": "", "_chunk": "Vp#1", "chunk_det": "și- a cumpărat", "_ctg": "PXD", "_deprel": "expl:poss", "_expand": "", "_head": 4, "_id": 2, "_lemma": "sine", "_msd": "Px3--d--y-----w", "_ner": "", "_ner_2": "", "_phon": "", "_syll": "", "_wordform": "și-", "_upos": "" },
The upos form is always empty, could you suggest the reason?