harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

model's PPL #95

Closed zhang-jinyi closed 7 years ago

zhang-jinyi commented 7 years ago

Hi, I trained with 1 layers ,500 size attention. But during training, |GParam| became so large... and the model 's PPL etc .came to nan..

Here is the log below:

Epoch: 4, Batch: 40500/66554, Batch size: 10, LR: 0.5000, PPL: 4.31, |Param|: 879.76, |GParam|: 25.93, Training: 6803/3384/3418 total/source/target tokens/sec
Epoch: 4, Batch: 41000/66554, Batch size: 10, LR: 0.5000, PPL: 4.31, |Param|: 880.32, |GParam|: 22.29, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 41500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 880.91, |GParam|: 16.95, Training: 6803/3384/3418 total/source/target tokens/sec
Epoch: 4, Batch: 42000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 881.49, |GParam|: 22.24, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 42500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 882.08, |GParam|: 28.14, Training: 6803/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 43000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 882.61, |GParam|: 24.55, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 43500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 883.21, |GParam|: 31.06, Training: 6803/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 44000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 883.74, |GParam|: 18.45, Training: 6805/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 44500/66554, Batch size: 10, LR: 0.5000, PPL: 4.34, |Param|: 884.53, |GParam|: 814886213757201792.00, Training: 6804/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 45000/66554, Batch size: 10, LR: 0.5000, PPL: 5.05, |Param|: 886.00, |GParam|: 90889.03, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 45500/66554, Batch size: 10, LR: 0.5000, PPL: 5.98, |Param|: 887.05, |GParam|: 50798.75, Training: 6804/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 46000/66554, Batch size: 10, LR: 0.5000, PPL: 7.15, |Param|: 888.39, |GParam|: 396.20, Training: 6805/3386/3419 total/source/target tokens/sec Epoch: 4, Batch: 46500/66554, Batch size: 10, LR: 0.5000, PPL: 8.21, |Param|: 888.69, |GParam|: 275.85, Training: 6804/3385/3418 total/source/target tokens/sec Epoch: 4, Batch: 47000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 47500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3418 total/source/target tokens/sec
Epoch: 4, Batch: 48000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 48500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 49000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3387/3419 total/source/target tokens/sec
Epoch: 4, Batch: 49500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3386/3419 total/source/target tokens/sec

Can anyone tell me why? Thanks in advance.

zhang-jinyi commented 7 years ago

the learning rate didn't changed,that confused me.

yoonkim commented 7 years ago

I've also noticed this on a few tasks. Can you try running with max_grad_norm = 1?

zhang-jinyi commented 7 years ago

Thank you for your prompt reply. I‘ll have a try tomorrow.

zhang-jinyi commented 7 years ago

Train 4.1078335923926 Valid 3.7427408103457 { 1 : 7.7501191315225 2 : 5.8271762714009 3 : 5.2173040900904 4 : 4.7859767762013 5 : 4.5947202873074 6 : 4.4156274915149 7 : 4.3070035668441 8 : 4.222295365506 9 : 4.1658840984745 10 : 3.9017234236271 11 : 3.7878230557402 12 : 3.7427408103457 } saving checkpoint to demo-6.5-model_epoch12.00_3.74.t7
Epoch: 13, Batch: 1000/66554, Batch size: 10, LR: 0.0625, PPL: 4.02, |Param|: 689.57, |GParam|: 47.77, Training: 7013/3483/3529 total/source/target tokens/sec
Epoch: 13, Batch: 2000/66554, Batch size: 10, LR: 0.0625, PPL: 4.04, |Param|: 689.58, |GParam|: 54.95, Training: 6971/3468/3503 total/source/target tokens/sec
Epoch: 13, Batch: 3000/66554, Batch size: 10, LR: 0.0625, PPL: 4.05, |Param|: 689.58, |GParam|: 27.69, Training: 6958/3459/3499 total/source/target tokens/sec

Thanks again for your advise. It works well for now. I'll close.