torch / torch7

http://torch.ch
Other
8.96k stars 2.38k forks source link

How to convert a cudnn.BLSTM model to nn.LSTM bidirectional model #1199

Closed rafikg closed 5 years ago

rafikg commented 5 years ago

@ngimel I have a *.t7 model that consists in few convolution layers and 1 block of cudnn.BLSTM(). To convert the model to pytorch, I create the same architecture with pytorch and try to get the weights from the t7 file. I think the convolution layers were correct but I have a doubt about the cudnn.BLSTM. When I extract the BLSTM weighs, I got one dimentional list of millions of parameters which corresponds to the same numbers of parameters in pytorch LSTM. However, in pytorch the weights and biases are with well know structure and weight_ih_l0, weight_hh_l0,... bias_ih_l_0, bias_hh_l0, ... weight_ih_l0_reverse, ... but in the cuddnn.BLSTM(), all parameters are set in one flattened list, so how to know the order and the shape of weights and biases ?? I debug the cudnn.BLSTM structure on th terminal and I get some idea about the concatenation orders and the shape: Exemple

# torch
rnn = cudnn.BLSTM(1,1, 2, false, 0.5)
# get the weights
weights = rnn:weights() 
th> rnn:weights()
{
  1 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  2 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  3 : 
    {
      1 : CudaTensor - size: 2
      2 : CudaTensor - size: 2
      3 : CudaTensor - size: 2
      4 : CudaTensor - size: 2
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  4 : 
    {
      1 : CudaTensor - size: 2
      2 : CudaTensor - size: 2
      3 : CudaTensor - size: 2
      4 : CudaTensor - size: 2
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
}

biases = rnn:biaises()

th> rnn:biases()
{
  1 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  2 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  3 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
  4 : 
    {
      1 : CudaTensor - size: 1
      2 : CudaTensor - size: 1
      3 : CudaTensor - size: 1
      4 : CudaTensor - size: 1
      5 : CudaTensor - size: 1
      6 : CudaTensor - size: 1
      7 : CudaTensor - size: 1
      8 : CudaTensor - size: 1
    }
}

all_flattened_params = rnn:parameters()

with this small example: I see that the rnn:parameters() function put the weighs and after that the biases in the above order. So:

weights =all_flattened_params[:-32]
biases = all_flattened_params[-32:]

Now, How to know the order of weights and biases regarding the pytorch nn.LSTM() ? I supposed that this order: weight_ih_l0, weight_hh_l0, weight_ih_l0_reverse, weight_hh_l0_reverse, weight_ih_l1, .... bias_ih_l0, bias_hh_l0, bias_ih_l0_reverse, bias_hh_l0_reverse, ....

but my model does not give the right output!!

ngimel commented 5 years ago

Here's how pytorch converts separate weights into one flattened buffer https://github.com/pytorch/pytorch/blob/fbd690c1fec0651ee9e6cc07ddcb12217ffb31bc/aten/src/ATen/native/cudnn/RNN.cpp#L622-L674, so you'd need to reverse this process to go from one buffer to separate weights.