Closed mhart closed 7 years ago
This is an issue I ran into yesterday and is concerned with the underlying QRNN library - specifically what happens when the batch size is of sequence length 1. I have fixed the issue in https://github.com/salesforce/pytorch-qrnn/commit/2ffbd32b2e50a73c8b581b00481ee6334b928b5c and if you pip install
that dependency again the error should be resolved.
Sorry for the issue!
Oh cool, thanks!
I just tried reinstalling and running generate.py again, and now I get:
Traceback (most recent call last):
File "generate.py", line 65, in <module>
output, hidden = model(input, hidden)
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
raw_output, new_h = rnn(raw_output, hidden[l])
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 67, in forward
source = torch.cat([X, Xm1], 2)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
return Concat.apply(dim, *iterable)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141
I'm not entirely sure about the above issue especially as I don't know which tensors are inconsistent. Pulling the PyTorch QRNN and AWD-LSTM-LM code and running it appears to work for me but that's of little reassurance to your situation!
There may also be an issue in that if you are loading from a saved model then it may be using the old source code still. Did you see a PyTorch warning at the top when restarting the model noting the source code has changed?
Also, is it possible to give the exact command line argument if it's not a proprietary dataset and so on?
Well I've just tried to generate the model again from scratch – it actually outputs the same error at the python pointer.py
stage:
$ python pointer.py --data data/mydata --save MYMODEL.pt --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000
RNNModel (
...
)
Traceback (most recent call last):
File "pointer.py", line 124, in <module>
val_loss = evaluate(val_data, test_batch_size)
File "pointer.py", line 71, in evaluate
output, hidden, rnn_outs, _ = model(data, hidden, return_h=True)
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
raw_output, new_h = rnn(raw_output, hidden[l])
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 65, in forward
Xm1 = torch.cat(Xm1, 0)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
return Concat.apply(dim, *iterable)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141
Hmmm, also same thing if I just use main.py, then exit early, then use generate.py.
My data is slightly different from Penn (and proprietary, unfortunately) – vocab is around ~15k – but very similarly formatted.
$ python -u main.py --data ./data/mydata --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 550 --save MYMODEL.pt
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
[QRNNLayer (
(linear): WeightDrop (
(module): Linear (800 -> 4650)
)
), QRNNLayer (
(linear): WeightDrop (
(module): Linear (1550 -> 4650)
)
), QRNNLayer (
(linear): WeightDrop (
(module): Linear (1550 -> 4650)
)
), QRNNLayer (
(linear): WeightDrop (
(module): Linear (1550 -> 1200)
)
)]
Args: Namespace(alpha=2, batch_size=20, beta=1, bptt=70, clip=0.2, cuda=True, data='./data/mydata', dropout=0.4, dropoute=0.1, dropouth=0.3, dropouti=0.4, emsize=400, epochs=550, log_interval=200, lr=30, model='QRNN', nhid=1550, nlayers=4, nonmono=5, save='MYMODEL.pt', seed=9001, tied=True, wdecay=1.2e-06, wdrop=0.1)
Model total parameters: 26415724
| epoch 1 | 200/ 723 batches | lr 30.00 | ms/batch 47.29 | loss 7.48 | ppl 1780.16
| epoch 1 | 400/ 723 batches | lr 30.00 | ms/batch 44.35 | loss 6.80 | ppl 899.39
| epoch 1 | 600/ 723 batches | lr 30.00 | ms/batch 44.22 | loss 6.74 | ppl 848.33
-----------------------------------------------------------------------------------------
| end of epoch 1 | time: 34.52s | valid loss 6.48 | valid ppl 654.69
-----------------------------------------------------------------------------------------
^C-----------------------------------------------------------------------------------------
Exiting from training early
=========================================================================================
| End of training | test loss 6.43 | test ppl 618.73
=========================================================================================
$ python generate.py --data ./data/mydata --checkpoint MYMODEL.pt --cuda
Traceback (most recent call last):
File "generate.py", line 65, in <module>
output, hidden = model(input, hidden)
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
raw_output, new_h = rnn(raw_output, hidden[l])
File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 67, in forward
source = torch.cat([X, Xm1], 2)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
return Concat.apply(dim, *iterable)
File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141
Do I need to reinstall something perhaps? Something more than just pytorch-qrnn
?
Before reading, check bottom :)
Not really sure what's happening there. You could add a line that prints out the size()
of the elements in Xm1
.
I just ran
python -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 600 --nlayers 2 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 550 --save PTB.pt
for one epoch followed by
python pointer.py --model QRNN --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000 --save PTB.pt
and it works (though with an obviously terrible perplexity).
Oh - did you run pip install -U git+https://github.com/salesforce/pytorch-qrnn
or just pip install
? I should have noted that the -U
is needed as I've not incremented the version yet. I should do so.
Edit: I think we must have posted at the same time. Reading your message now but I'd check that QRNN has been properly updated (potentially check /miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py
to see if Xm1 = []
and then is appended to - i.e. my recent change).
Actually – just realized I ran pointer.py without --model QRNN
🙄
So, pointer.py
does work – but generate.py
doesn't still.
Is there a need to specify that it's a QRNN
when invoking generate.py
? And if so, how?
(I ran pip install -U
before – it definitely updated BTW, because the error being generated is different from before)
Ah, I just realized the issues in the generating script - I may have been misreading. The generation code hasn't been heavily tested against QRNN - I'll run it here and see. It may be something trivial that the generate.py
code isn't doing or it may be highlighting a bug due to small batch size or similar. Will report back.
Got it - the issue is that it doesn't know it's a QRNN model so doesn't reset the previous stored state. As it doesn't reset the previous X that it had stored for QRNN(window=2), the size of the last batch is used, resulting in a batch size 10 trying to concat with a batch size 1.
I've updated the code (fix in https://github.com/salesforce/awd-lstm-lm/commit/9c623587a9c565e43aea8064ac573ad06907b7ea) so you can now run with --model QRNN
and it'll have the correct result.
Sorry for the bug but glad to see you're using QRNN! If you can ever tell me what you're up to with it I'd love to hear ^_^
(if there's an issue, feel free to re-open the issue - I accidentally hit Comment and Close :))
Awesome, works a treat – thanks mate! Will keep you updated on how I use this – love how quick it trains 👍
I've trained a QRNN, but when I try to use generate.py with it, I get the following: