salesforce / awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch
BSD 3-Clause "New" or "Revised" License
1.96k stars 488 forks source link

Error while trying to reproduce results for Pytorch 0.3 #67

Open mricepops opened 6 years ago

mricepops commented 6 years ago

Hi, I am getting the following error while trying to reproduce the results for Pytorch 0.3.


| end of epoch 14 | time: 407.44s | valid loss 5.72 | valid ppl 305.88 | valid bpc 8.257

Saving Averaged! | epoch 15 | 200/ 7209 batches | lr 30.00000 | ms/batch 53.71 | loss 5.57 | ppl 263.69 | bpc 8.043 | epoch 15 | 400/ 7209 batches | lr 30.00000 | ms/batch 53.14 | loss 5.53 | ppl 252.63 | bpc 7.981 | epoch 15 | 600/ 7209 batches | lr 30.00000 | ms/batch 53.08 | loss 5.52 | ppl 249.33 | bpc 7.962 | epoch 15 | 800/ 7209 batches | lr 30.00000 | ms/batch 53.54 | loss 5.54 | ppl 253.97 | bpc 7.989 | epoch 15 | 1000/ 7209 batches | lr 30.00000 | ms/batch 53.63 | loss 5.52 | ppl 249.55 | bpc 7.963 | epoch 15 | 1200/ 7209 batches | lr 30.00000 | ms/batch 53.02 | loss 5.54 | ppl 253.72 | bpc 7.987 | epoch 15 | 1400/ 7209 batches | lr 30.00000 | ms/batch 53.60 | loss 5.53 | ppl 251.89 | bpc 7.977 | epoch 15 | 1600/ 7209 batches | lr 30.00000 | ms/batch 53.29 | loss 5.51 | ppl 247.33 | bpc 7.950 | epoch 15 | 1800/ 7209 batches | lr 30.00000 | ms/batch 52.76 | loss 5.51 | ppl 247.76 | bpc 7.953 | epoch 15 | 2000/ 7209 batches | lr 30.00000 | ms/batch 52.73 | loss 5.54 | ppl 255.28 | bpc 7.996 | epoch 15 | 2200/ 7209 batches | lr 30.00000 | ms/batch 53.04 | loss 5.53 | ppl 251.35 | bpc 7.974 | epoch 15 | 2400/ 7209 batches | lr 30.00000 | ms/batch 53.48 | loss 5.54 | ppl 254.36 | bpc 7.991 | epoch 15 | 2600/ 7209 batches | lr 30.00000 | ms/batch 53.59 | loss 5.56 | ppl 258.62 | bpc 8.015 | epoch 15 | 2800/ 7209 batches | lr 30.00000 | ms/batch 53.38 | loss 5.56 | ppl 259.14 | bpc 8.018 | epoch 15 | 3000/ 7209 batches | lr 30.00000 | ms/batch 53.42 | loss 5.53 | ppl 253.05 | bpc 7.983 | epoch 15 | 3200/ 7209 batches | lr 30.00000 | ms/batch 53.43 | loss 5.52 | ppl 250.28 | bpc 7.967 | epoch 15 | 3400/ 7209 batches | lr 30.00000 | ms/batch 53.34 | loss 5.51 | ppl 246.68 | bpc 7.946 Traceback (most recent call last): File "main.py", line 244, in train() File "main.py", line 208, in train loss.backward() File "/home/wasiahmad/software/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/wasiahmad/software/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: invalid argument 3: Index tensor must have same dimensions as input tensor at /opt/conda/conda-bld/pytorch_1518244507981/work/torch/lib/THC/generic/THCTensorScatterGather.cu:199

I cloned the repository from an older commit: https://github.com/salesforce/awd-lstm-lm/commit/9205e9bab49b3cdeb0591d1db0d28724d13dd595. The strange thing is that I am getting this error after 15 epochs. Any help is deeply appreciated!