moses-smt / mosesdecoder

Moses, the machine translation system
http://www.statmt.org/moses
GNU Lesser General Public License v2.1
1.58k stars 775 forks source link

train-model.perl failed #230

Open superyyy1202 opened 2 years ago

superyyy1202 commented 2 years ago

I call train-model.perl, failed to generate the model folder and the inside of the content, only generate the folder, as shown in the figure below train. Out the log error is/home/yyy/working/train/corpus/useful. The VCB Died with signal 11, with coredump. I have set ulimit -c unlimited to not limit the size of the core file, but still report this error. Could you please help me figure out what the problem is? I've been stuck here for three days.

superyyy1202 commented 2 years ago

/home/yyy/working/train/corpus/zh.vcb ERROR: Execution of: /home/yyy/mosesdecoder-master/tools/GIZA++ -CoocurrenceFile /home/yyy/working/train/giza.zh-en/zh-en.cooc -c /home/yyy/working/train/corpus/zh-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o /home/yyy/working/train/giza.zh-en/zh-en -onlyaldumps 1 -p0 0.999 -s /home/yyy/working/train/corpus/en.vcb -t /home/yyy/working/train/corpus/zh.vcb died with signal 11, with coredump

hieuhoang commented 2 years ago

it's usually an issue with the data. eg. the data isn't encoded in utf8, there's non-printing characters, there's a double space separating words.

superyyy1202 commented 2 years ago

I have checked that the codes of my parallel corpus are ALL UTF-8, but I still reported the following errors. Could you please help me to check again./home/yyy/working/train/corpus/zh-en-int-train.snt > /home/yyy/working/train/giza.zh-en/zh-en.cooc Segmentation fault (core dumped) Exit code: 139 ERROR at /home/yyy/mosesdecoder-master/scripts/training/train-model.perl line 1351.

superyyy1202 commented 2 years ago

I found another set of data, which failed to call the kenLM provided by Moses, but could be generated successfully with IRSTLM.The file ./ build_binary can be called successfully.

Reading /home/yyy/corpus/train.seq.mx.clean.zh ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100


Segmentation fault (core dumped)