grad NaN - Githubissues

ictnlp / OR-NMT

Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>

41 stars 10 forks source link

grad NaN #4

Closed haorannlp closed 4 years ago

haorannlp commented 4 years ago

Hi Zhang wen,

After I ran the OR-RNN model for 20 hours with default parameters in wargs.py (I only changed the data directory), the grad became NaN. Do you have any ideas? Thanks

My Configurations: python 2.7 torch 1.0.1

zhang-wen commented 4 years ago

@haorannlp hi, the version of OR-RNN model may not be tested well. So there may be some problems. I think the reason why the grad became NaN may be that some numbers are divided by zero or the sqrt operation on the negative number, so you should add eplison to some numbers with division and sqrt operations.

haorannlp commented 4 years ago

@zhang-wen Thanks bro. I will check the code later. Besides, the translation task code in Fairseq (translation.py line 43; indexed_dataset.py line 102, 106) seems to need .idx and .bin files while I name my data files as train.BPE.en, train.BPE.de/ newstest2013.en-de.en, newstest2013.en-de.de/ newstest2014.en-de.en, newstest2014.en-de.de. Would you mind telling me how to name train/val/test data files when running OR-Transformer and do we need any modifications in the training command? Thanks.

zhang-wen commented 4 years ago

@haorannlp Yes, firstly you need to generate the directory of data_bin by running the preprecess.py in fairseq. Please refer to the generation scripts of wmt16_en_de_bpe32k in the link: https://github.com/ictnlp/awesome-transformer. After that, you can run the training command by using python train.py $data_dir, just like our README.md in https://github.com/ictnlp/OR-NMT. Feel free to ask any questions, thanks.

haorannlp commented 4 years ago

@zhang-wen Thanks, it worked! BTW, why is it WMT'16 instead of WMT'14 as mentioned in the paper? When people talk about WMT' 14, are they referring to WMT' 14 Europarl-v7? I'm a little confused since there are several datasets in WMT'XX. The last thing is the training command in README.md uses Transformer big instead of` Transformer base as the original paper does. I guess this is a typo.

zhang-wen commented 4 years ago

@haorannlp Hi, the WMT'14 ende training set we used was obtained by the shell script https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2de.sh provided by Fairseq.