moses-smt / mgiza

A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.
161 stars 60 forks source link

sentences lost when runs force_align #23

Open duterscmy opened 3 years ago

duterscmy commented 3 years ago

I have train a model on a big corpus, and then I want to obtain align result on some new data. Like ./scripts/force-align-moses.sh, the .vcb .cooc files are new generated and the .classes use existed files, then use mgiza to obtain results. However,there are nearly a half of sentences are lost in en2cn.A3.final.part000-047, hence I can't use ./scripts/merge_result.py to merge results.
Where could be my problem???

duterscmy commented 3 years ago

"lost" means that the align results of these sentence are not appeared in en2cn.A3.final.part000-047

duterscmy commented 3 years ago

the error maybe WARNING: Hill Climbing yielded a zero score viterbi alignment for the following pair: AL(l:7,m:3)(a: 5 6 7 )(fert: 0 0 0 0