hotshotco / Hotshot-XL

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
Apache License 2.0
982 stars 77 forks source link

About the training dataset #34

Closed Kevin-1342 closed 7 months ago

Kevin-1342 commented 7 months ago

Regarding the training dataset, would you mind me asking how did you collect tens of millions of clips? My initial understanding was that the label for long video may not be suitable for short video clips. Many Thanks.

aakashs commented 7 months ago

Yes, creating a text-to-video generator is a challenge - public text-video datasets are few and far between, and typically consist of clips of non-uniform length, low resolutions, encoding artifacts, and motion blur.

Bootstrapping off a text-to-image foundation model takes advantage of existing knowledge from the more available text-image datasets and reposes text-to-video generation more narrowly as temporal understanding.