wilson1yan / VideoGPT

MIT License
962 stars 115 forks source link

Next frame predictor #37

Open Piyush-555 opened 1 year ago

Piyush-555 commented 1 year ago

Hi, great work! I have a somewhat naive doubt.

Just curious, why not limit VQVAE to model space instead of space-time? Thus, exploiting the autoregressive nature of the transformer to generate videos with varying frames.