Open 54wb opened 9 months ago
Hey @54wb, thank you for your question! Normally I used videos with 16 frames, the downsampling of 4 in temporal dimension would give 4 frames in the latent. So for MMNIST probably you could try 2 8 8, which gives a similar number of frames (5) in the latent?
Hi, Thanks for your great work. When I train the mmnist dataset with a downsampling parameter of 4, 8,8, mmnist dataset's input and output in T-dimension are both 10, so the encoder changes the T-dimension to 2, so the decoder can't return the dimension to 10, I want to know if T-dimension of dataset is not a power of 2 what should I do with it?