Open Nash2325138 opened 5 years ago
https://github.com/xiadingZ/video-caption.pytorch/blob/9e4759d9a6b48a72c005bba7c3bb9c53065f1f28/models/EncoderRNN.py#L25
As the title, why do we need another linear transform layer for video features when the rnn will do it inside the cell?
If it is to save the number of parameters, will it be better if we specify the rnn input dimension using another variable? For instance:
self.vid2hid = nn.Linear(dim_vid, dim_rnn_input) ... self.rnn = self.rnn_cell(dim_rnn_input, dim_hidden, n_layers, batch_first=True, bidirectional=bidirectional, dropout=self.rnn_dropout_p)
https://github.com/xiadingZ/video-caption.pytorch/blob/9e4759d9a6b48a72c005bba7c3bb9c53065f1f28/models/EncoderRNN.py#L25
As the title, why do we need another linear transform layer for video features when the rnn will do it inside the cell?
If it is to save the number of parameters, will it be better if we specify the rnn input dimension using another variable? For instance: