moses-smt / mosesdecoder

Moses, the machine translation system
http://www.statmt.org/moses
GNU Lesser General Public License v2.1
1.58k stars 778 forks source link

normalize-punctuation.perl Change the Chinese punctuation marks in English sentences into English. #226

Open QzzIsCoding opened 2 years ago

QzzIsCoding commented 2 years ago

Hi, thanks for sharing the tool. And I have a question. I used this command, but it didn't achieve the effect I wanted. So, I want to know how to use the file normalize-punctuation.perl.

NORM_PUNC=/mosesdecoder/scripts/tokenizer/normalize-punctuation.perl perl ${NORM_PUNC} -l en < ${data_dir}/en.txt > ${data_dir}/norm_en.txt

The sentences in the two documents are the same. However,the imaging process will in-evitably be affected。 The Chinese punctuation marks didn't change to English punctuation marks.

hieuhoang commented 2 years ago

looking at the code, it doesn't look like the script convert chinese characters. Maybe you can extend it to do so