Closed leesunfreshing closed 5 years ago
Would like to have a solid implementation as part of the library too. Started drafting up peephole LSTMs and convolutional LSTMs for some people here, but while the convolutional LSTM works on basic options I haven't a) correctly followed the equations from the original paper and b) adapted the hidden state to work with all kinds of conv options (like stride, dilation etc.).
On that note, implementing this would be easier if peephole connections were added to the main code too (https://github.com/pytorch/pytorch/issues/630).
I also drafted a ConvLSTM version here.
I'm still working on it, but maybe someone can find it useful nonetheless.
Any plans to officially support it?
Convolutional LSTM is great for video prediction, can't wait for the official support for relaxing the coding for video prediction.
Because there is no official implementation of this module, I have to turn back to keras.
It would be nice to add a conv-LSTM
layer that is much better for Training and Detection on Video - faster and higher mAP@0.5: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494148968
Implementation: https://github.com/AlexeyAB/darknet/blob/24f7d94ab4214dd43805270056effd95bfd9a5d9/src/conv_lstm_layer.c#L809-L1185
You can use the same conv-LSTM approach that is used in Keras: https://github.com/keras-team/keras/blob/master/keras/layers/convolutional_recurrent.py#L665-L737 which refers to the paper: https://github.com/keras-team/keras/blob/master/keras/layers/convolutional_recurrent.py#L902-L906
this paper: http://arxiv.org/abs/1506.04214v1
With parameter that enables/disables the Peephole-connection (red rectangles):
since there are variants without the peephole-connection when it isn't required: https://en.wikipedia.org/wiki/Long_short-term_memory#Variants
Also it would be nice to add parameter that switches the peephole from o
- Element-wise-product (Hadamard product) to *
- Convolution, so it makes possible to resize convLSTM-layer (input and output) regardless of the size of the weights Wci, Wcf, Wco
(the whole network will be resizable - i.e. we can train the model with input resolution 416x416, then change it to 608x608 and use it for detection as it is done in Yolo v2/v3)
Given the even greater ubiquity of this module and the current level of the JIT, it would definitely be worth getting an official implementation.
We'll be happy to accept a PR from community if someone wants to take it.
before we see a PR, I'd want to see what the proposed API is. Wouldn't want to go all the way to making a PR and then iterating on the API a lot.
Might make sense to do nn.Conv2dLSTMCell
first (functional wrapper) and then nn.Conv2dLSTM
. Going by Keras (linked above) and Sonnet, they just assume the same convolution hyperparameters (e.g. kernel size) for both the input and hidden state, so doing this would mean the same API as nn.Conv2d
(though seems like it would be useful to also have a peephole
bool flag at end to add fully-convolutional cell state). Changing hyperparameters for both input and hidden separately would be more flexible but I don't think we need to support this at this point. So basically this is the __init__
API that I've done in my gist (haven't made peephole optional, but would make that a bool option).
Is this still on the way?
Without ConvLSTM, multivariate forecasting with sliding windows is limited to a single 2D dataset from a single site.
[windows (batch/ samples), timesteps (rows), features (cols)]
I am still hoping to have a fast/native ConvLSTM/GRU layer in 2021
still hoping in 2022
Any plans to have this implemented as a basic module? Moreover, it would be really nice if pytorch have the rnn package in torch wrapped