Support for joint training on images and videos?

zhaoyue-zephyrus / bsq-vit

[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization

https://arxiv.org/abs/2406.07548

MIT License

63 stars 0 forks source link

Support for joint training on images and videos? #2

Open Epiphqny opened 3 weeks ago

Epiphqny commented 3 weeks ago

Dear authors, thanks for your awesome work. I want to ask whether the model support joint training on images and videos? as it is very important to improve the quality of video generation. Also, does the model support compression in temporal dimension?