Batch support in seq2seq tutorial

ehsanasgari commented 7 years ago

Hi, Thank you for the great work! Would you please add batching to the tutorial as well?

vijendra-rana commented 7 years ago

Hello, @spro I have been working through extending with batch. My code is here https://github.com/vijendra-rana/Random/blob/master/translation_with_batch.py . I have created some fake data for this. But the problem is I am getting error in loss saying

RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

I understand we cannot have loss being backwarded twice but I don't see anywhere that I am doing it twice. Also I have question about masking how would you mask the loss at the encoder. Not sure how to implement it the encoder o/p being size (seq_len,batch,hidden_size) and mask being (batch_size,seq_len)

Thanks in advance for help :)

spro commented 7 years ago

I put a first version of the batched model at https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb via 31fdb61387e62948f6a24dc9a2dadd6d3221a73c

The biggest changes are using pack_padded_sequence before the encoder RNN and pad_packed_sequence after it, and the masked cross entropy loss from @jihunchoi after decoding. For the decoder itself changes are minor because it only runs one time step at a time.

vijendra-rana commented 7 years ago

Thanks, @spro for your effort in putting these together. Your tutorials are really nice.

howardyclo commented 6 years ago

Hi guys, I implemented more features based on this tutorial (e.g. batched computation for attention) and added some notes. Check out my repo here: https://github.com/howardyclo/pytorch-seq2seq-example/blob/master/seq2seq.ipynb

physicsman commented 6 years ago

I noticed some implementations of batch seq2seq with attention allow for an embedded size that is different then the hidden size. Is there a reason to match the two sizes?

suwangcompling commented 6 years ago

@spro Thanks for the nice code sample. Had some trouble, looking to get some help: I tried to run it out of the box, hit an error in this block:

max_target_length = max(target_lengths)

decoder_input = Variable(torch.LongTensor([SOS_token] * small_batch_size))
decoder_hidden = encoder_hidden[:decoder_test.n_layers] # Use last (forward) hidden state from encoder
all_decoder_outputs = Variable(torch.zeros(max_target_length, small_batch_size, decoder_test.output_size))

if USE_CUDA:
    all_decoder_outputs = all_decoder_outputs.cuda()
    decoder_input = decoder_input.cuda()

# Run through decoder one time step at a time
for t in range(max_target_length):
    decoder_output, decoder_hidden, decoder_attn = decoder_test(
        decoder_input, decoder_hidden, encoder_outputs
    )
    all_decoder_outputs[t] = decoder_output # Store this step's outputs
    decoder_input = target_batches[t] # Next input is current target

# Test masked cross entropy loss
loss = masked_cross_entropy(
    all_decoder_outputs.transpose(0, 1).contiguous(),
    target_batches.transpose(0, 1).contiguous(),
    target_lengths
)
print('loss', loss.data[0])

The error reads as follows:


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-28-babf231e41ef> in <module>()
     13 for t in range(max_target_length):
     14     decoder_output, decoder_hidden, decoder_attn = decoder_test(
---> 15         decoder_input, decoder_hidden, encoder_outputs
     16     )
     17     all_decoder_outputs[t] = decoder_output # Store this step's outputs

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

<ipython-input-24-43d7954b3ba4> in forward(self, input_seq, last_hidden, encoder_outputs)
     35         # Calculate attention from current RNN state and all encoder outputs;
     36         # apply to encoder outputs to get weighted average
---> 37         attn_weights = self.attn(rnn_output, encoder_outputs)
     38         context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x S=1 x N
     39 

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

<ipython-input-22-61485b548d0f> in forward(self, hidden, encoder_outputs)
     27             # Calculate energy for each encoder output
     28             for i in range(max_len):
---> 29                 attn_energies[b, i] = self.score(hidden[:, b], encoder_outputs[i, b].unsqueeze(0))
     30 
     31         # Normalize energies to weights in range 0 to 1, resize to 1 x B x S

<ipython-input-22-61485b548d0f> in score(self, hidden, encoder_output)
     40         elif self.method == 'general':
     41             energy = self.attn(encoder_output)
---> 42             energy = hidden.dot(energy)
     43             return energy
     44 

RuntimeError: Expected argument self to have 1 dimension, but has 2

NLPScott commented 6 years ago

@suwangcompling hidden = hidden.squeeze(), encoder_output = encoder_output.squeeze() you can try it!

spro / practical-pytorch

Batch support in seq2seq tutorial #27