Closed nateraw closed 1 year ago
This logic should work as fix
num_interpolation_steps = [120, 150, 60]
fps = 30
audio_start_sec = 2.0
for i, num_step in enumerate(num_interpolation_steps):
audio_offset = audio_start_sec + sum(num_interpolation_steps[:i]) / fps
duration = num_step / fps
print(f"Audio Offset: {audio_offset} | duration: {duration}")
when you use multiple prompts/seeds (4 in my case), the audio for the later clips begin to be off from where it should be. This is actually because when calculating
T
, the audio offset calculation is incorrect:https://github.com/nateraw/stable-diffusion-videos/blob/90039cf2448691d5ce3185ac33ddd0886606f713/stable_diffusion_videos/stable_diffusion_pipeline.py#L739
This should instead be running sum of
num_interpolation_steps
.