Closed caoandong closed 10 months ago
Cc: @patrickvonplaten @patil-suraj @DN6
Think I ran into the same issues here: https://github.com/huggingface/diffusers/pull/6003 :sweat_smile:
Do you think the issue has something to do the the noise scheduler? If so then training the SVD model using the euler discrete scheduler won't work either. Will dig into the issue more when I'm free.
P.S. The SVD paper description about the modification to the scheduler:
@patrickvonplaten Oh actually just saw this implementation.
nvm the pipeline suffered from the same issue. The example worked because the number of inference steps is set to 100 and the inpainting is ran the whole way through; but inspecting the intermediate steps shows that the model fails to denoise the latent. Something is clearly wrong with the added noise.
Yes I'm also quite sure that the added noise is incorrect here - @patil-suraj @DN6 can you check?
@caoandong what do you mean by "inspecting the intermediate steps shows that the model fails to denoise the latent"?
Replace
if use_overlay and i >= overlay_start_index and i < overlay_end_index:
init_latents_proper = self.scheduler.add_noise(
init_latents_orig, noise, torch.tensor([t])
)
latents = (init_latents_proper * overlay_mask) + \
(latents * (1 - overlay_mask)
)
with
if use_overlay and i >= overlay_start_index and i < overlay_end_index:
noise_timestep = timesteps[i + 1]
init_latents_proper = self.scheduler.add_noise(
init_latents_orig, noise, torch.tensor([noise_timestep])
)
latents = (init_latents_proper * (overlay_mask)) + latents * (1 - overlay_mask)
`
https://github.com/huggingface/diffusers/assets/13116982/c025deaf-a187-42b2-b6cb-ebfbd0aef03d
@CiaraStrawberry won't this index exceed the length of the timesteps on the last step?
noise_timestep = timesteps[i + 1]
I just avoided the inpainting on the last step and it does seem to work!
Without:
With @CiaraStrawberry's suggested change
mine was with the overlay window before so it never hit the last timestep, so thumbs up for just skipping it when it does, that's what the sd inpainting pipelines do.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Model/Pipeline/Scheduler description
Hi, thank you for integrating stable video diffusion pipeline. I tried to implement a simple inpainting pipeline, inspired by the legacy inpainting pipeline, but encountered issue in adding noise to the inpainting latents.
In particular, the
scheduler.add_noise
function seem to add the incorrect noise to the latents. I suspect the issue have something to do with the special noise scheduler proposed in the SVD paper; in particular in Section D.2., the author modified the preconditioning functions and distribution over the training noise levels, do we need to modify theEulerDiscreteScheduler
to accomodate such modification?The following is my sketchy implementation of the inpainting pipline:
Btw setting
overlay_end
to 1.0 the output would still look a bit noisy, which probably means that the latent input to the unet is incorrect.Thank you again for your help! This new model is very exciting to play with!
Open source status
Provide useful links for the implementation
No response