Open SeekPoint opened 6 years ago
Does it occur each time? Seems they have a discussion on the problem https://discuss.pytorch.org/t/error-at-backward-the-number-of-sizes-provided-must-be-greater-or-equal-to-the-number-of-dimensions-in-the-tensor/9203/9. The code worked fine months ago, I'll try running it later.
another try:
mldl@mldlUB1604:~/ub16_prj/AoAReader$ python3 train.py Namespace(batch_size=32, dict='data/dict.pt', dropout=0.1, embed_size=384, epochs=13, gpu=0, gru_size=384, learning_rate=0.001, log_interval=50, save_model='model', start_epoch=1, train_from='', traindata='data/train.txt.pt', validdata='data/dev.txt.pt', weight_decay=0.0001) Loading dictrionary from data/dict.pt Loading train data from data/train.txt.pt Loading valid data from data/dev.txt.pt
Epoch 1, 1/ 3775; avg loss: 5.69; acc: 15.62; 29 s elapsed Segmentation fault (core dumped) mldl@mldlUB1604:~/ub16_prj/AoAReader$
python3 Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.
mldl@mldlUB1604:~/ub16_prj/AoAReader$ python3 train.py 2>&1 | tee yknote---train---log Namespace(batch_size=32, dict='data/dict.pt', dropout=0.1, embed_size=384, epochs=13, gpu=0, gru_size=384, learning_rate=0.001, log_interval=50, save_model='model', start_epoch=1, train_from='', traindata='data/train.txt.pt', validdata='data/dev.txt.pt', weight_decay=0.0001) Loading dictrionary from data/dict.pt Loading train data from data/train.txt.pt Loading valid data from data/dev.txt.pt
Epoch 1, 1/ 3775; avg loss: 5.61; acc: 25.00; 16 s elapsed Epoch 1, 51/ 3775; avg loss: 4.83; acc: 29.38; 879 s elapsed Epoch 1, 101/ 3775; avg loss: 3.99; acc: 27.44; 1757 s elapsed Traceback (most recent call last): File "train.py", line 224, in
main()
File "train.py", line 221, in main
trainModel(model, train_dataset, valid_dataset, optimizer)
File "train.py", line 152, in trainModel
train_loss, train_acc = trainEpoch(epoch)
File "train.py", line 121, in trainEpoch
loss.backward()
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 91, in apply
return self._forward_cls.backward(self, args)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/reduce.py", line 26, in backward
return grad_output.expand(ctx.input_size), None, None
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 722, in expand
return Expand.apply(self, sizes)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/tensor.py", line 111, in forward
result = i.expand(new_size)
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /home/mldl/pytorch/torch/lib/THC/generic/THCTensor.c:309
mldl@mldlUB1604:~/ub16_prj/AoAReader$