pixeli99 / SVD_Xtend

Stable Video Diffusion Training Code and Extensions.
481 stars 45 forks source link

Questions about the noise sampling. #21

Open Muon2 opened 6 months ago

Muon2 commented 6 months ago

Great works! I am wondering:

  1. Why not use EDM noise sampling instead of the stratege in the simple diffusion?
  2. Why using fixed noise strength (0) on the condition image? I thinkl the sampling expression has been given in svd paper.
m-muaz commented 6 months ago

@pixeli99 Thanks for your work. I have a similar question, why you chose the rand_cosine_interpolated noise scheduler instead of the one mentioned in the EDM (Karras et al.) paper. The one highlighted in the following image. Correct me if my understanding is wrong!

(Edit: And what does the variables noise_d_low and noise_d_high correspond to?)

Thanks in advance.

image

pixeli99 commented 6 months ago

Thank you very much for raising this question, This was due to an oversight on my part, as I originally thought that sigma in the code followed a simple log-normal distribution, but I uploaded an incorrect version;

The use of simple diffusion was merely an immature attempt of mine, because my intent was to try mixing videos of different resolutions for training. I wanted to use the strategy of simple diffusion to apply noise of varying distribution (in fact, I am not sure if my understanding is correct,). So, I am actually very eager to ask everyone for their understanding of p_train(σ).

pixeli99 commented 6 months ago

Regarding the second question, it is solely because I have been lazy that I will complete this section of the code.

pixeli99 commented 6 months ago

@pixeli99 Thanks for your work. I have a similar question, why you chose the rand_cosine_interpolated noise scheduler instead of the one mentioned in the EDM (Karras et al.) paper. The one highlighted in the following image. Correct me if my understanding is wrong!

(Edit: And what does the variables noise_d_low and noise_d_high correspond to?)

Thanks in advance.

image

Hello, as @Muon2 mentioned, this is the stratege of simple diffusion. You can look at section 3.1 of the paper for more detailed information (since I don't fully understand it either, it might be more reliable to read the original paper directly😢).

m-muaz commented 6 months ago

@pixeli99 Thanks for your work. I have a similar question, why you chose the rand_cosine_interpolated noise scheduler instead of the one mentioned in the EDM (Karras et al.) paper. The one highlighted in the following image. Correct me if my understanding is wrong! (Edit: And what does the variables noise_d_low and noise_d_high correspond to?) Thanks in advance. image

Hello, as @Muon2 mentioned, this is the stratege of simple diffusion. You can look at section 3.1 of the paper for more detailed information (since I don't fully understand it either, it might be more reliable to read the original paper directly😢).

I see. I'll look into the Simple Diffusion paper.

BTW, I am curious have you tried other noise scheduling techniques besides the one mentioned in the Simple Diffusion paper?

Muon2 commented 6 months ago

@pixeli99 Thanks for your work. I have a similar question, why you chose the rand_cosine_interpolated noise scheduler instead of the one mentioned in the EDM (Karras et al.) paper. The one highlighted in the following image. Correct me if my understanding is wrong! (Edit: And what does the variables noise_d_low and noise_d_high correspond to?) Thanks in advance. image

Hello, as @Muon2 mentioned, this is the stratege of simple diffusion. You can look at section 3.1 of the paper for more detailed information (since I don't fully understand it either, it might be more reliable to read the original paper directly😢).

I think you may be wrong about the simplediffusion sigma sampling. If you are using simplediffusion , you should probably change the sampling scheduler as well, since the Euler sampler is using original EDM timesteps formula instead of the one in the simplediffusion .

pixeli99 commented 6 months ago

I understand what you're saying, but I think that different sigma distributions correspond to different diffusion paths. In theory, would it be possible to use the same sampler for sampling? I suspect there might be a flaw in my understanding, but I'm not sure where I've gone wrong. When we're training, is our definition of the timestep the same, as in 0.25ln(σ)?

pixeli99 commented 6 months ago

@m-muaz I haven't tried it yet, but if I make any progress, I will update here.

Muon2 commented 6 months ago

I understand what you're saying, but I think that different sigma distributions correspond to different diffusion paths. In theory, would it be possible to use the same sampler for sampling? I suspect there might be a flaw in my understanding, but I'm not sure where I've gone wrong. When we're training, is our definition of the timestep the same, as in 0.25ln(σ)?

Sigma distributions would NOT affect diffusion paths actually. The key in simplediffusion is the change of alpha and beta scheduler, just changing the training sigma distributions would not help in my opinion.

pixeli99 commented 6 months ago

I roughly understand what you mean, I might still need to read carefully to grasp the principle here, thank you very much for your reply.