nateraw / stable-diffusion-videos

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Apache License 2.0
4.36k stars 417 forks source link

how long does it take to make a video? #147

Open nasser135 opened 1 year ago

nasser135 commented 1 year ago

Hey, I am still new here, I don't understand why it takes to much time and the video is still loading? I didn't create a complex prompt. can I know how much does it take to create a video in AVG?

nateraw commented 1 year ago

It completely depends on your # of prompts/batch size/num_interpolation_steps. Can you provide what you ran/where you ran it?

nasser135 commented 1 year ago

school class in the old days of the Phoenicians | school class in the 1500 | school class in the 1700 | school class in the 1800 | school class in the 1900 | school class in the 2000 | school class in the 2005 | school class in the 2010 | school class in the 2022 | school class with students wearing VRs and futuristic world

Seed: empty

scheduler: klms

num_inference_steps: 50

guidance_scale: 7.5

num_steps: 60

fps: 15

it's taking too long, and then I get an error Prediction timed out

nasser135 commented 1 year ago

stable-diffusion-videos

nateraw commented 1 year ago

Ah so you're using the gradio interface, right?

Looks like 10 prompts with 60 frames in between each. So that would be (len(prompts) - 1) * 60 total images generated (in your case 9 * 60, so 540 frames). At a batch size of 1 in a standard colab runtime, this is going to take quite a while.

Some tips:

I'll see if I can report back with some more info in a standard runtime. you're using Free version of Colab?

nateraw commented 1 year ago

Looks like 13 seconds per frame on standard runtime with batch size of 1. That's 7020 seconds for all 540 frames, or 117 mins (almost 2 hours).

Will see if some tricks lead to speedup (which would be significant in your case).

nateraw commented 1 year ago

The default GPU I got on free colab was Tesla T4. was able to set batch_size=4 on that. By doing so, it gets the inference speed to ~ 10.5s per image, and results in ~90min to create that video, so savings of 30 min.

There are other things to try as well...again, will report back.