Closed NickRuiz closed 7 years ago
I noticed that I had some blank lines, so I removed them with ~/mmt/vendor/moses/scripts/training/clean-corpus-n.perl TED.train en zh TED.train.clean 1 100000
. However, the error persists.
I also created a dummy project, where I copied one line from the English side and changed the extension. Something like this:
head -1 tmp.en tmp.es
This trains fine. But if I change the extension like so mv tmp.es tmp.zh
and try to train with the zh language, the problem occurs. It seems like there is a step in the pipeline that doesn't like zh
.
Hi @NickRuiz
I've just published the fix in the master branch, could you confirm that it solves this issue also in your installation?
Thanks!
I tried a simple en-zh (English-Mandarin) training example using a small snippet of TED talk training data. When I attempt to run
./mmt create
, I get an error. It seems like the/home/interact/mmt/engines/enzh_tmp/models/vocabulary
folder is never created, which might be causing the issue. enzh_tmp.zipI can train the en-it and it-en examples with no issues.