zhaoyue-zephyrus / bsq-vit

[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization
https://arxiv.org/abs/2406.07548
MIT License
63 stars 0 forks source link

Support for joint training on images and videos? #2

Open Epiphqny opened 3 weeks ago

Epiphqny commented 3 weeks ago

Dear authors, thanks for your awesome work. I want to ask whether the model support joint training on images and videos? as it is very important to improve the quality of video generation. Also, does the model support compression in temporal dimension?