Thanks for your great implementation.
I was just wondering how to use this code for actual video future frame prediction.
Say that I have pretrained the vqvae to compress 16 3 256 256 video and trained a pixel_snail model on that compressed latent.
Now if I have 4 3 256 256 video, what am I supposed to do for inference?
I am a bit confused even after reading the paper.
Hello!
Thanks for your great implementation. I was just wondering how to use this code for actual video future frame prediction. Say that I have pretrained the vqvae to compress 16 3 256 256 video and trained a pixel_snail model on that compressed latent. Now if I have 4 3 256 256 video, what am I supposed to do for inference? I am a bit confused even after reading the paper.
Thanks.