rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Other
1.6k stars 270 forks source link

Video Prediction Results #59

Open pravn opened 3 years ago

pravn commented 3 years ago

@rosinality I am wondering if we could generate the results in section 4.4. We should create sequential context from latent frames, so we need a scheme to process latent frames - basically something like a recurrent seq2seq model.

https://arxiv.org/pdf/1711.00937.pdf

1) Store discrete latent space. 2) Create a pixelcnn/snail encoder (can do it with same setup as pixelcnn prior in code). 3) Process each frame with pixelsnail and use last frame's output as context. 4) Use an autoregressive or recurrent scheme to process context for each frame. 5) Decode new frames after creating context from input frames.

rosinality commented 3 years ago

Yes, I think you can do like the way you specified.