Closed kiranshahi closed 2 years ago
Thanks for your interest in our work. The objective of the Conv-LSTM here is not to learn spatial-temporal information. It acts as a mechanism to learn non-linear representation shared among decoding and encoding path. More specifically, we consider that the feature set coming from the encoder path contains more localization information while the decoder path brings semantic information, however, both tensors see the same object. To effectively map these two tensors into a combined version we use ConvLSTM.
As you have mentioned ConvLSTM2D is used for Spatio-temporal information.
But I am confused, is this model really going to learn the temporal information? There are two reasons for it.
Can you share your findings? Does this really work?
Reference: Temporal Information Stackoverflow answer