salesforce / awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch
BSD 3-Clause "New" or "Revised" License
1.96k stars 488 forks source link

`ValueError: result of slicing is an empty tensor` when trying to run generate.py on QRNN #14

Closed mhart closed 7 years ago

mhart commented 7 years ago

I've trained a QRNN, but when I try to use generate.py with it, I get the following:

  File "generate.py", line 68, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 60, in forward
    Xm1 = [self.prevX if self.prevX is not None else X[:1, :, :] * 0, X[:-1, :, :]]
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 76, in __getitem__
    return Index.apply(self, key)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
    result = i.index(ctx.index)
ValueError: result of slicing is an empty tensor
Smerity commented 7 years ago

This is an issue I ran into yesterday and is concerned with the underlying QRNN library - specifically what happens when the batch size is of sequence length 1. I have fixed the issue in https://github.com/salesforce/pytorch-qrnn/commit/2ffbd32b2e50a73c8b581b00481ee6334b928b5c and if you pip install that dependency again the error should be resolved.

Sorry for the issue!

mhart commented 7 years ago

Oh cool, thanks!

I just tried reinstalling and running generate.py again, and now I get:

Traceback (most recent call last):
  File "generate.py", line 65, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 67, in forward
    source = torch.cat([X, Xm1], 2)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
    return Concat.apply(dim, *iterable)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
    return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141
Smerity commented 7 years ago

I'm not entirely sure about the above issue especially as I don't know which tensors are inconsistent. Pulling the PyTorch QRNN and AWD-LSTM-LM code and running it appears to work for me but that's of little reassurance to your situation!

There may also be an issue in that if you are loading from a saved model then it may be using the old source code still. Did you see a PyTorch warning at the top when restarting the model noting the source code has changed?

Also, is it possible to give the exact command line argument if it's not a proprietary dataset and so on?

mhart commented 7 years ago

Well I've just tried to generate the model again from scratch – it actually outputs the same error at the python pointer.py stage:

$ python pointer.py --data data/mydata --save MYMODEL.pt --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000
RNNModel (
...
)
Traceback (most recent call last):
  File "pointer.py", line 124, in <module>
    val_loss = evaluate(val_data, test_batch_size)
  File "pointer.py", line 71, in evaluate
    output, hidden, rnn_outs, _ = model(data, hidden, return_h=True)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 65, in forward
    Xm1 = torch.cat(Xm1, 0)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
    return Concat.apply(dim, *iterable)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
    return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141
mhart commented 7 years ago

Hmmm, also same thing if I just use main.py, then exit early, then use generate.py.

My data is slightly different from Penn (and proprietary, unfortunately) – vocab is around ~15k – but very similarly formatted.

$ python -u main.py --data ./data/mydata --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 550 --save MYMODEL.pt
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
[QRNNLayer (
  (linear): WeightDrop (
    (module): Linear (800 -> 4650)
  )
), QRNNLayer (
  (linear): WeightDrop (
    (module): Linear (1550 -> 4650)
  )
), QRNNLayer (
  (linear): WeightDrop (
    (module): Linear (1550 -> 4650)
  )
), QRNNLayer (
  (linear): WeightDrop (
    (module): Linear (1550 -> 1200)
  )
)]
Args: Namespace(alpha=2, batch_size=20, beta=1, bptt=70, clip=0.2, cuda=True, data='./data/mydata', dropout=0.4, dropoute=0.1, dropouth=0.3, dropouti=0.4, emsize=400, epochs=550, log_interval=200, lr=30, model='QRNN', nhid=1550, nlayers=4, nonmono=5, save='MYMODEL.pt', seed=9001, tied=True, wdecay=1.2e-06, wdrop=0.1)
Model total parameters: 26415724
| epoch   1 |   200/  723 batches | lr 30.00 | ms/batch 47.29 | loss  7.48 | ppl  1780.16
| epoch   1 |   400/  723 batches | lr 30.00 | ms/batch 44.35 | loss  6.80 | ppl   899.39
| epoch   1 |   600/  723 batches | lr 30.00 | ms/batch 44.22 | loss  6.74 | ppl   848.33
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 34.52s | valid loss  6.48 | valid ppl   654.69
-----------------------------------------------------------------------------------------
^C-----------------------------------------------------------------------------------------
Exiting from training early
=========================================================================================
| End of training | test loss  6.43 | test ppl   618.73
=========================================================================================

$ python generate.py --data ./data/mydata --checkpoint MYMODEL.pt --cuda
Traceback (most recent call last):
  File "generate.py", line 65, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 67, in forward
    source = torch.cat([X, Xm1], 2)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat
    return Concat.apply(dim, *iterable)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward
    return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:141

Do I need to reinstall something perhaps? Something more than just pytorch-qrnn?

Smerity commented 7 years ago

Before reading, check bottom :)

Not really sure what's happening there. You could add a line that prints out the size() of the elements in Xm1.

I just ran python -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 600 --nlayers 2 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 550 --save PTB.pt for one epoch followed by python pointer.py --model QRNN --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000 --save PTB.pt and it works (though with an obviously terrible perplexity).

Oh - did you run pip install -U git+https://github.com/salesforce/pytorch-qrnn or just pip install? I should have noted that the -U is needed as I've not incremented the version yet. I should do so.

Edit: I think we must have posted at the same time. Reading your message now but I'd check that QRNN has been properly updated (potentially check /miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py to see if Xm1 = [] and then is appended to - i.e. my recent change).

mhart commented 7 years ago

Actually – just realized I ran pointer.py without --model QRNN 🙄

So, pointer.py does work – but generate.py doesn't still.

Is there a need to specify that it's a QRNN when invoking generate.py? And if so, how?

mhart commented 7 years ago

(I ran pip install -U before – it definitely updated BTW, because the error being generated is different from before)

Smerity commented 7 years ago

Ah, I just realized the issues in the generating script - I may have been misreading. The generation code hasn't been heavily tested against QRNN - I'll run it here and see. It may be something trivial that the generate.py code isn't doing or it may be highlighting a bug due to small batch size or similar. Will report back.

Smerity commented 7 years ago

Got it - the issue is that it doesn't know it's a QRNN model so doesn't reset the previous stored state. As it doesn't reset the previous X that it had stored for QRNN(window=2), the size of the last batch is used, resulting in a batch size 10 trying to concat with a batch size 1.

I've updated the code (fix in https://github.com/salesforce/awd-lstm-lm/commit/9c623587a9c565e43aea8064ac573ad06907b7ea) so you can now run with --model QRNN and it'll have the correct result.

Sorry for the bug but glad to see you're using QRNN! If you can ever tell me what you're up to with it I'd love to hear ^_^

(if there's an issue, feel free to re-open the issue - I accidentally hit Comment and Close :))

mhart commented 7 years ago

Awesome, works a treat – thanks mate! Will keep you updated on how I use this – love how quick it trains 👍