Closed zhang-jinyi closed 7 years ago
the learning rate didn't changed,that confused me.
I've also noticed this on a few tasks. Can you try running with max_grad_norm = 1?
Thank you for your prompt reply. I‘ll have a try tomorrow.
Train 4.1078335923926
Valid 3.7427408103457
{
1 : 7.7501191315225
2 : 5.8271762714009
3 : 5.2173040900904
4 : 4.7859767762013
5 : 4.5947202873074
6 : 4.4156274915149
7 : 4.3070035668441
8 : 4.222295365506
9 : 4.1658840984745
10 : 3.9017234236271
11 : 3.7878230557402
12 : 3.7427408103457
}
saving checkpoint to demo-6.5-model_epoch12.00_3.74.t7
Epoch: 13, Batch: 1000/66554, Batch size: 10, LR: 0.0625, PPL: 4.02, |Param|: 689.57, |GParam|: 47.77, Training: 7013/3483/3529 total/source/target tokens/sec
Epoch: 13, Batch: 2000/66554, Batch size: 10, LR: 0.0625, PPL: 4.04, |Param|: 689.58, |GParam|: 54.95, Training: 6971/3468/3503 total/source/target tokens/sec
Epoch: 13, Batch: 3000/66554, Batch size: 10, LR: 0.0625, PPL: 4.05, |Param|: 689.58, |GParam|: 27.69, Training: 6958/3459/3499 total/source/target tokens/sec
Thanks again for your advise. It works well for now. I'll close.
Hi, I trained with 1 layers ,500 size attention. But during training, |GParam| became so large... and the model 's PPL etc .came to nan..
Here is the log below:
Epoch: 4, Batch: 40500/66554, Batch size: 10, LR: 0.5000, PPL: 4.31, |Param|: 879.76, |GParam|: 25.93, Training: 6803/3384/3418 total/source/target tokens/sec
Epoch: 4, Batch: 41000/66554, Batch size: 10, LR: 0.5000, PPL: 4.31, |Param|: 880.32, |GParam|: 22.29, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 41500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 880.91, |GParam|: 16.95, Training: 6803/3384/3418 total/source/target tokens/sec
Epoch: 4, Batch: 42000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 881.49, |GParam|: 22.24, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 42500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 882.08, |GParam|: 28.14, Training: 6803/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 43000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 882.61, |GParam|: 24.55, Training: 6804/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 43500/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 883.21, |GParam|: 31.06, Training: 6803/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 44000/66554, Batch size: 10, LR: 0.5000, PPL: 4.30, |Param|: 883.74, |GParam|: 18.45, Training: 6805/3385/3419 total/source/target tokens/sec
Epoch: 4, Batch: 44500/66554, Batch size: 10, LR: 0.5000, PPL: 4.34, |Param|: 884.53, |GParam|: 814886213757201792.00, Training: 6804/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 45000/66554, Batch size: 10, LR: 0.5000, PPL: 5.05, |Param|: 886.00, |GParam|: 90889.03, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 45500/66554, Batch size: 10, LR: 0.5000, PPL: 5.98, |Param|: 887.05, |GParam|: 50798.75, Training: 6804/3385/3418 total/source/target tokens/sec
Epoch: 4, Batch: 46000/66554, Batch size: 10, LR: 0.5000, PPL: 7.15, |Param|: 888.39, |GParam|: 396.20, Training: 6805/3386/3419 total/source/target tokens/sec Epoch: 4, Batch: 46500/66554, Batch size: 10, LR: 0.5000, PPL: 8.21, |Param|: 888.69, |GParam|: 275.85, Training: 6804/3385/3418 total/source/target tokens/sec Epoch: 4, Batch: 47000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 47500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3418 total/source/target tokens/sec
Epoch: 4, Batch: 48000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 48500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6805/3386/3419 total/source/target tokens/sec
Epoch: 4, Batch: 49000/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3387/3419 total/source/target tokens/sec
Epoch: 4, Batch: 49500/66554, Batch size: 10, LR: 0.5000, PPL: nan, |Param|: nan, |GParam|: nan, Training: 6806/3386/3419 total/source/target tokens/sec
Can anyone tell me why? Thanks in advance.