hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.53k stars 2.06k forks source link

training data #613

Closed HaoZhang990127 closed 5 days ago

HaoZhang990127 commented 1 month ago

Hi,

Thank you for your nice work.

You used both video data and image data during training, what was your ratio of video to images at each stage? I am adding a lot of image data, specifically using 80k high-quality video data with aesthetics higher than 5.6 and 1.2 million image data with aesthetics higher than 5.6, but the image introduction didn't make the generation quality improve, may I ask how did you use the image data to improve the quality during the training process? What are your tips in introducing images for training?

Thank you so much.

JThh commented 1 month ago

Can @zhengzangw please take a look?

zhengzangw commented 1 month ago

In total, we use approximately 30M video data and 3M image data.

leonardodora commented 1 month ago

In total, we use approximately 30M video data and 3M image data.

In your report, it said official panda 30m subset was used in stage2. But we couldn’t find any 30m subset in panda project. Could you please share the link of panda 30m? Thank you so much.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 5 days ago

This issue was closed because it has been inactive for 7 days since being marked as stale.