iamrakesh28 / Video-Prediction

Implementation of Transformer Encoder Decoder Architecture for Video Predictions
40 stars 11 forks source link

Conv2D input shape is 4D(batch_size, rows, cols, depth)while this work input shape is 5D (batch_size, target_seq_len, rows, cols, depth) #4

Open katieliao opened 2 years ago

katieliao commented 2 years ago

Hello, I was trying to run this code. However, when I was trying to train the model, an error occur:

Input 0 of layer conv2d is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: [8, 5, 40, 40, 1]

8 is batch size, 5 is target sequence length, 40x40 is rows x cols and 1 is depth.

I just checked the source code and found that in "encoding" and "decoding" step, we have to run conv2d function, which requires a 4D input [batch size, rows, cols, channels]

How to tackle this problem?

iamrakesh28 commented 2 years ago

Hi, Can you tell me which file you were trying run? I have defined some main functions inside datasets/.