pixeli99 / SVD_Xtend

Stable Video Diffusion Training Code and Extensions.
481 stars 45 forks source link

Questions about the noise sampling. #18

Closed Pandint closed 6 months ago

Pandint commented 6 months ago

Thank you for sharing! This code helps me a lot.

When using this code to finetune SVD, I have some questions with the noise sample. The noise sampling in this code is as following: sigmas = rand_cosine_interpolated(shape=[bsz,], image_d=image_d, noise_d_low=noise_d_low, noise_d_high=noise_d_high, sigma_data=sigma_data, min_value=min_value, max_value=max_value).to(latents.device) sigmas = sigmas[:, None, None, None, None] noisy_latents = latents + noise * sigmas

I want to know if I can replace this simply with a diffusers noise scheduler such as DDPMScheduler?

Hope to get your help!

pixeli99 commented 6 months ago

In fact, it is feasible, but this will lead to a lot of unnecessary trouble, you need to invest more computing power to do the FT, you can refer to the description in the SVD paper, where they have used the EDM-framework. image

Pandint commented 6 months ago

Thanks a lot!