Open vladob54 opened 5 years ago
Indeed , nice suggestion
Currently it is not straightforward to implement this, because current UDPipe does not distinguish "real" morphological lexicon and guesser rules derived from the training data. (Our MorphoDiTa tool can do it, there we keep this distinction.)
BTW, if you have a morphological dictionary, you can perform the required operation manually after running UDPipe.
Also, the future UDPipe 2.0 will allow explicitly passing morphological dictionary (during inference, not just during training), so it will then be possible to indicate which words were processed just by a "guesser".
Leaving the issue open as a reminder.
This is relevant to #50 too.
I could quite appreciate if udpipe indicated somehow that a respective word form was not present in the morphological lexicon, i.e., its lemma, PoS and features have been guessed, This type of information is provided, e.g,, by TreeTagger and we make use of it while post-processing the tagger output, and also provide it to corpus users so that they can incorporate the respective attribute into their CQL queries...
Best, Vlado B, 10:45
http://unesco.uniba.sk/guest/