songweige / TATS

Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
MIT License
267 stars 17 forks source link

Downsample and Upsample ratio for T dimension #27

Open 54wb opened 9 months ago

54wb commented 9 months ago

Hi, Thanks for your great work. When I train the mmnist dataset with a downsampling parameter of 4, 8,8, mmnist dataset's input and output in T-dimension are both 10, so the encoder changes the T-dimension to 2, so the decoder can't return the dimension to 10, I want to know if T-dimension of dataset is not a power of 2 what should I do with it?

songweige commented 9 months ago

Hey @54wb, thank you for your question! Normally I used videos with 16 frames, the downsampling of 4 in temporal dimension would give 4 frames in the latent. So for MMNIST probably you could try 2 8 8, which gives a similar number of frames (5) in the latent?