nateraw / stable-diffusion-videos

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Apache License 2.0
4.4k stars 421 forks source link

audio alignment is off in later videos when calling pipeline.walk #75

Closed nateraw closed 1 year ago

nateraw commented 1 year ago

when you use multiple prompts/seeds (4 in my case), the audio for the later clips begin to be off from where it should be. This is actually because when calculating T, the audio offset calculation is incorrect:

https://github.com/nateraw/stable-diffusion-videos/blob/90039cf2448691d5ce3185ac33ddd0886606f713/stable_diffusion_videos/stable_diffusion_pipeline.py#L739

This should instead be running sum of num_interpolation_steps.

nateraw commented 1 year ago

Same issue on this line:

https://github.com/nateraw/stable-diffusion-videos/blob/90039cf2448691d5ce3185ac33ddd0886606f713/stable_diffusion_videos/stable_diffusion_pipeline.py#L753

nateraw commented 1 year ago

This logic should work as fix

num_interpolation_steps = [120, 150, 60]
fps = 30
audio_start_sec = 2.0
for i, num_step in enumerate(num_interpolation_steps):
     audio_offset = audio_start_sec + sum(num_interpolation_steps[:i]) / fps
     duration = num_step / fps
     print(f"Audio Offset: {audio_offset} | duration: {duration}")