I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just
unknown.
Here is how I am creating the vocabulary:
_from nltk.corpus import stopwords
stoplist = stopwords.words('english')
file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus
text = file.read()
clean = [word for word in text.split() if word not in stoplist]
from collections import Counter
count = Counter(clean)
frequency = count.most_common(17188)
l1,l2=zip(*frequency)
with open('/home/ubuntu/mukund_nmt/spanishdata/vocab.en', 'w') as f:
for item in l1:
f.write("%s\n" % item) #writing the vocab file as a string
Once the vocabulary is created, I run the training as follows:
It does give some output at the start, which looks normal to me since its just the beginning:
However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):
Hi,
I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just unknown.
Here is how I am creating the vocabulary:
_from nltk.corpus import stopwords stoplist = stopwords.words('english') file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus text = file.read() clean = [word for word in text.split() if word not in stoplist] from collections import Counter count = Counter(clean) frequency = count.most_common(17188) l1,l2=zip(*frequency) with open('/home/ubuntu/mukund_nmt/spanishdata/vocab.en', 'w') as f: for item in l1: f.write("%s\n" % item) #writing the vocab file as a string
Once the vocabulary is created, I run the training as follows:
python -m nmt.nmt --src=en --tgt=es --vocab_prefix=/home/ubuntu/mukund_nmt/spanish_data/vocab --train_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_train --dev_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_dev --test_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_testing --out_dir=/home/ubuntu/mukund_nmt/spanish_data/model1 --num_train_steps=3000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu
It does give some output at the start, which looks normal to me since its just the beginning:
However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):
Can someone please help me with this. Thanks.