Closed cerisara closed 6 years ago
OK, I think I got it: it's in line 186 isn't it ?
Nope, uncommenting this line does not help:
| epoch 1 | 600/ 3515 batches | lr 0.00100 | ms/batch 1721.19 | loss 1.57 | ppl 4.82 | bpc 2.270
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 241, in <module>
train()
File "main.py", line 198, in train
output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/xtof/git/awd-lstm-lm/model.py", line 82, in forward
raw_output, new_h = rnn(raw_output, hidden[l])
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/xtof/git/awd-lstm-lm/weight_drop.py", line 47, in forward
return self.module.forward(*args)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 204, in forward
output, hidden = func(input, self.all_weights, hx)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/nn/_functions/rnn.py", line 385, in forward
return func(input, *fargs, **fkwargs)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/autograd/function.py", line 328, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/autograd/function.py", line 350, in forward
result = self.forward_extended(*nested_tensors)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/nn/_functions/rnn.py", line 294, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/home/xtof/envs/pytorchnew/lib/python3.5/site-packages/torch/backends/cudnn/rnn.py", line 281, in forward
fn.reserve = torch.cuda.ByteTensor(reserve_size.value)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
OK, I managed to make it fit within a 12GB GPU by reducing the bptt down to 100 I don't know what will be the impact on BPC, I'll check in... 50 hours ;-)
Looks like the impact of reducing bptt to 100 is not huge, as I get BPC=1.17 on the dev after 50 epochs. So it's a viable option when you get out of memory errors !
Hi, training crashed not enough memory on Titan X 12GB with char-LSTM on enwik8
The trick about reducing the "cap" on sequence length links to a 404 URL: could you please let me know where I can do that ?
Thanks a lot for the great code !