tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.38k stars 1.96k forks source link

why vocabs are the same between src and tgt in script "wmt16_en_de.sh" #421

Open kingkf opened 5 years ago

kingkf commented 5 years ago

Create vocabulary file for BPE

echo -e "\n\n" > "${OUTPUT_DIR}/vocab.bpe.${merge_ops}" cat "${OUTPUT_DIR}/train.tok.clean.bpe.${merge_ops}.en" "${OUTPUT_DIR}/train.tok.clean.bpe.${merge_ops}.de" | \ ${OUTPUT_DIR}/subword-nmt/get_vocab.py | cut -f1 -d ' ' >> "${OUTPUT_DIR}/vocab.bpe.${merge_ops}"

done

Duplicate vocab file with language suffix

cp "${OUTPUT_DIR}/vocab.bpe.32000" "${OUTPUT_DIR}/vocab.bpe.32000.en" cp "${OUTPUT_DIR}/vocab.bpe.32000" "${OUTPUT_DIR}/vocab.bpe.32000.de"