Video Prediction Results

@rosinality I am wondering if we could generate the results in section 4.4. We should create sequential context from latent frames, so we need a scheme to process latent frames - basically something like a recurrent seq2seq model.

https://arxiv.org/pdf/1711.00937.pdf

1) Store discrete latent space. 2) Create a pixelcnn/snail encoder (can do it with same setup as pixelcnn prior in code). 3) Process each frame with pixelsnail and use last frame's output as context. 4) Use an autoregressive or recurrent scheme to process context for each frame. 5) Decode new frames after creating context from input frames.

rosinality / vq-vae-2-pytorch

Video Prediction Results #59