Model crashes under pytorch 0.4

zou3519 commented 6 years ago

Hi, The folks over at pytorch are working on cutting a new 0.4 release. We'd like to make the transition as smooth as possible (if you were planning on upgrading), so we've been testing a number of community repos.

I ran a model and it errors out due to a change in pytorch. Minimal repro:

# Install pytorch-nightly (Currently our pre-release branch)
conda install pytorch-nightly -c pytorch

# Get data
./getdata.sh

# Run model
python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 1 && \
python -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs
1

Stack trace: https://gist.github.com/zou3519/142d48df1c03db9fe9c11717ad9a59f2

Pytorch 0.4 adds zero-dimensional tensors that cannot be iterated over, which seems to be what the error is complaining about. Changing https://github.com/salesforce/awd-lstm-lm/blob/f2e88672bdaf93ee709390fda2a24abb6db77989/utils.py#L8 in particular to handle this case should fix it.

cc @soumith

guillaume-chevalier commented 6 years ago

@zou3519 @Smerity I get this error once I attempt to correct the bug:

ubuntu@ip-172-31-72-29:~/awd-lstm-lm$ python3 -u main.py --epochs 500 --data data/wikitext-2 --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1550 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 40 --save WT2.pt
Loading cached dataset...
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
Applying weight drop of 0.1 to weight
[QRNNLayer(
  (linear): WeightDrop(
    (module): Linear(in_features=800, out_features=4650, bias=True)
  )
), QRNNLayer(
  (linear): WeightDrop(
    (module): Linear(in_features=1550, out_features=4650, bias=True)
  )
), QRNNLayer(
  (linear): WeightDrop(
    (module): Linear(in_features=1550, out_features=4650, bias=True)
  )
), QRNNLayer(
  (linear): WeightDrop(
    (module): Linear(in_features=1550, out_features=1200, bias=True)
  )
)]
Using []
Args: Namespace(alpha=2, batch_size=40, beta=1, bptt=70, clip=0.25, cuda=True, data='data/wikitext-2', dropout=0.4, dropoute=0.1, dropouth=0.2, dropouti=0.4, emsize=400, epochs=500, log_interval=200, lr=30, model='QRNN', nhid=1550, nlayers=4, nonmono=5, optimizer='sgd', resume='', save='WT2.pt', seed=4002, tied=True, wdecay=1.2e-06, wdrop=0.1, when=[-1])
Model total parameters: 33354628
Traceback (most recent call last):
  File "main.py", line 241, in <module>
    train()
  File "main.py", line 197, in train
    output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 70, in forward
    emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0)
  File "/home/ubuntu/awd-lstm-lm/embed_regularize.py", line 19, in embedded_dropout
    X = embed._backend.Embedding.apply(words, masked_embed_weight,
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/backends/backend.py", line 10, in __getattr__
    raise NotImplementedError
NotImplementedError

Here is how I attempted to fix the broken method:

def repackage_hidden(h):
    """Wraps hidden states in new Variables, to detach them from their history."""
    if type(h) == Variable or (type(h) == Tensor and len(h.size()) == 0):
        return Variable(h.data)
    else:
        return tuple(repackage_hidden(v) for v in h)

Note that I tried for every dimension sizes in len(h.size()) == 0), len(h.size()) == 1) and len(h.size()) == 2). Any idea on what's going on?

P.S. you can reuse/modify/license my pasted code above without any restrictions whatsoever.

shawntan commented 6 years ago

Some fixes with #43

This is how I fixed the issue with repackage_hidden:

def repackage_hidden(h):
    """Wraps hidden states in new Tensors,
    to detach them from their history."""
    if isinstance(h, torch.Tensor):
        return h.detach()
     else:
         return tuple(repackage_hidden(v) for v in h)

For the issue in embed_regularize.py: Replace:

    X = embed._backend.Embedding.apply(words, masked_embed_weight,
                                       padding_idx, embed.max_norm, embed.norm_type,
                                       embed.scale_grad_by_freq, embed.sparse
                                       )

with:

    X = F.embedding(
        words, masked_embed_weight,
        padding_idx,
        embed.max_norm, embed.norm_type,
        embed.scale_grad_by_freq, embed.sparse
    )

puttkraidej commented 6 years ago

It works, thanks @shawntan

keskarnitish commented 6 years ago

Thanks for this! Let me look at this carefully and merge it once I run some tests.

keskarnitish commented 6 years ago

Thanks everyone for bringing this to our attention, and to @shawntan for proposing fixes. This should be fixed in https://github.com/salesforce/awd-lstm-lm/commit/441e122c221390a260358837df8dd6cc5fc22e82 . Closing this issue now, please feel free to reopen as necessary.

salesforce / awd-lstm-lm

Model crashes under pytorch 0.4 #39