Open quanpn90 opened 8 years ago
cudnn RNN/LSTM accepts inputs with the different sequence length and thus does not require padding. The requirement is that inputs be sorted in the descending order of sequence length. This capability is not yet supported with torch bindings, though.
@ngimel just out of curiosity, where did you get that information about the sequence length sorting? I'm trying to desperately to get the LSTM layer working.
From the manual:-) In cudnnRNNForwardTraining entry "The first dimension of the tensors may decrease from element n to element n+1 but may not increase."
I guess my question is, what manual? :) I can't find anything except the cudnn.h header file and that information is not in it. I also can't find it in the CUDA manuals. Maybe i'm just being dumb here.
nm i found it, i think when i downloaded v5 they didn't have a link to the user guide yet. Thank you, sorry!
@ngimel Hi,
I want to group sequences with different length into a batch (sentences for example) so padding is necessary. By the way, I will try disabling bias while learning and see if any problem arises. Thank you.
@soumith It would be a nice feature request for the next NVIDIA cudnn release. The lack of zero-masking is the only reason I am still not using cudnn LSTMs.
Sequences with the different length can already be grouped into a batch without padding, cudnn supports that. Torch bindings don't, at the moment.
@ngimel I am guessing that each row of the batch as exactly one sequence? If so, this is not the same as zero-masking.
I, too, was wondering if feeding in zero-padded variable length sequences would significantly affect learning a good final hidden state. I wrote a quick script to convince myself that it doesn't if the RNN dimension is high enough.
I've found that it's (at least conceptually) simpler to just group sequences of the same length.
@ngimel hi, could you please show me a demo of "Sequences with the different length can already be grouped into a batch without padding". THANK YOU
Look at variable length sequences test for an example of how it can be done https://github.com/soumith/cudnn.torch/blob/master/test/test_rnn.lua#L324
@ngimel getting back to your comment from last year on variable length sequence support in cudnn. I think it doesn't refer to sequence length but batch size. It seems some clarification were added in the current manual (cudnnRNNForwardTraining):
The first dimension (batch size) of the tensors may decrease from element n to element n+1 but may not increase. Eachtensor descriptor must have the same second dimension (vector length).
Or are you referring to something different? Please let me know if there is a misunderstanding.
Ok, nevermind. I had an unrolled version of the operator in mind.. When iteratively calling cudnnRNNForwardTraining for each time step reducing the batch size does of course work as you mentioned.
Thanks everyone for the wonderful cudnn bindings,
I would like to ask if NVIDIA provides any interface for masking the hiddenOutputs at each step, provided that the input sequence is padded (neural machine translation for example).
Concretely:
The input sequence is padded with 0s, such as
seq = torch.Tensor({{0,0,0,0,1,2,3},{0,0,4,5,6,7,8}, {0,0,0,1,2,3,4}}):t():cuda() Thanks to the LookupTableMaskZero from the rnn package, we can proceed to the LSTM with the zero embeddings at "0" indexes. I wonder if the cudnn LSTM can mask the hiddenOutput based on the input ? My current solution is to zero out the LSTM biases, so that the hidden layers at padded positions will always be zero. But I am not sure if it affects the learning process.
Thank you,