Closed mosynaq closed 6 years ago
The syntax seem fine. How long did the tagger do nothing? What data are you training to train on? What is the size of the data? How many unique UPOS/XPOS/FEATS tags does it have?
Hey @foxik , thank you for answering! It was looking idle for more than half a day! The size of the data is about 35 MB and it is the Persian portion of Hamledt3.
Ok -- the problem is that there are 51105 unique XPOS tags in the data, which of course makes training extremely slow. The problem is that sentence IDs are part of XPOS tags.
For the training to progress, you should either not train on XPOS tags at all (by providing use_xpostag=0
option for tagger
argument), or remove the senIDs (`sed 's/|senID=[0-9]*//'); in both cases, the training progresses normally (an iteration per minute or so).
Closing.
Hi everyone. I'm trying to train a model using
udpipe
and I issue the following command:Tokenization finishes without any problem. But when it comes to tagger, it just occupies a huge part of RAM and does nothing, no matter what I do. What is my problem? Is it the syntax of the command? Should I provide something?
Thanks!
p.s. The word2vec model is made using gensim, though I'm not sure it is the problem.