pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.18k stars 22.43k forks source link

Convlstm module? #1706

Closed leesunfreshing closed 5 years ago

leesunfreshing commented 7 years ago

Any plans to have this implemented as a basic module? Moreover, it would be really nice if pytorch have the rnn package in torch wrapped

Kaixhin commented 7 years ago

Would like to have a solid implementation as part of the library too. Started drafting up peephole LSTMs and convolutional LSTMs for some people here, but while the convolutional LSTM works on basic options I haven't a) correctly followed the equations from the original paper and b) adapted the hidden state to work with all kinds of conv options (like stride, dilation etc.).

On that note, implementing this would be easier if peephole connections were added to the main code too (https://github.com/pytorch/pytorch/issues/630).

ndrplz commented 7 years ago

I also drafted a ConvLSTM version here.

I'm still working on it, but maybe someone can find it useful nonetheless.

yiminglin-ai commented 6 years ago

Any plans to officially support it?

guanfuchen commented 5 years ago

Convolutional LSTM is great for video prediction, can't wait for the official support for relaxing the coding for video prediction.

geolvr commented 5 years ago

Because there is no official implementation of this module, I have to turn back to keras.

AlexeyAB commented 5 years ago

It would be nice to add a conv-LSTM layer that is much better for Training and Detection on Video - faster and higher mAP@0.5: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494148968

Implementation: https://github.com/AlexeyAB/darknet/blob/24f7d94ab4214dd43805270056effd95bfd9a5d9/src/conv_lstm_layer.c#L809-L1185

You can use the same conv-LSTM approach that is used in Keras: https://github.com/keras-team/keras/blob/master/keras/layers/convolutional_recurrent.py#L665-L737 which refers to the paper: https://github.com/keras-team/keras/blob/master/keras/layers/convolutional_recurrent.py#L902-L906

this paper: http://arxiv.org/abs/1506.04214v1

With parameter that enables/disables the Peephole-connection (red rectangles):

image

since there are variants without the peephole-connection when it isn't required: https://en.wikipedia.org/wiki/Long_short-term_memory#Variants


Also it would be nice to add parameter that switches the peephole from o - Element-wise-product (Hadamard product) to * - Convolution, so it makes possible to resize convLSTM-layer (input and output) regardless of the size of the weights Wci, Wcf, Wco (the whole network will be resizable - i.e. we can train the model with input resolution 416x416, then change it to 608x608 and use it for detection as it is done in Yolo v2/v3)

image

Kaixhin commented 5 years ago

Given the even greater ubiquity of this module and the current level of the JIT, it would definitely be worth getting an official implementation.

ailzhang commented 5 years ago

We'll be happy to accept a PR from community if someone wants to take it.

soumith commented 5 years ago

before we see a PR, I'd want to see what the proposed API is. Wouldn't want to go all the way to making a PR and then iterating on the API a lot.

Kaixhin commented 5 years ago

Might make sense to do nn.Conv2dLSTMCell first (functional wrapper) and then nn.Conv2dLSTM. Going by Keras (linked above) and Sonnet, they just assume the same convolution hyperparameters (e.g. kernel size) for both the input and hidden state, so doing this would mean the same API as nn.Conv2d (though seems like it would be useful to also have a peephole bool flag at end to add fully-convolutional cell state). Changing hyperparameters for both input and hidden separately would be more flexible but I don't think we need to support this at this point. So basically this is the __init__ API that I've done in my gist (haven't made peephole optional, but would make that a bool option).

tcapelle commented 4 years ago

Is this still on the way?

aiqc commented 3 years ago

Without ConvLSTM, multivariate forecasting with sliding windows is limited to a single 2D dataset from a single site.

[windows (batch/ samples), timesteps (rows), features (cols)]

tcapelle commented 3 years ago

I am still hoping to have a fast/native ConvLSTM/GRU layer in 2021

characat0 commented 2 years ago

still hoping in 2022