Hi, I had a doubt regarding the temporal encoding portion of the spatiotemporal autoencoder. Looking at the documentation, it seems that the ConvLSTM2D accepts input with dimensions as (batch_size, time, x, y, channels). However, in the code, it receives the input as (batch_size, x, y, time, channels). Shouldn't the time axis be along the first dimension of the input, i.e., input shape should be (10,227,227)?
Hi, I had a doubt regarding the temporal encoding portion of the spatiotemporal autoencoder. Looking at the documentation, it seems that the ConvLSTM2D accepts input with dimensions as (batch_size, time, x, y, channels). However, in the code, it receives the input as (batch_size, x, y, time, channels). Shouldn't the time axis be along the first dimension of the input, i.e., input shape should be (10,227,227)?