Closed nhnt11 closed 9 months ago
By the way, I've tried prompt-to-image with very low step count, and it works "fine" - the images aren't great but they don't look like intermediates.
Here's a p2i result with num_inference_steps=5
:
By the way, this issue also impacts prompt-to-image gens when using the refiner, since the refiner uses the img2img pipeline.
OK this is based on incomplete understanding, but after a lot of reading I am suspicious that LMS is simply a very bad choice for img2img at high strength, and that there might not be any particular implementation bug.
Consider a generation with 100 steps and image strength 95. Essentially, the generation will:
If you look at the last 5 sigmas when LMS for 100 steps, this is what they look like:
0.20005333 0.17116994 0.13862235 0.09870332 0.02916753
0.04322357 0.03925607 0.0356049 0.03224877 0.02916753
Look at how small the Karras sigma values are relative to without.
My wild guess is this is related to what @LuChengTHU says about numerical stability close to t=0 for second order solvers. LMS is a fourth order sampler by default, which could explain why it's so exaggerated.
If I force the order to 1, 2, 3, and 4 (by passing it in as a param to step()
), here is how the result varies:
Very very naively, I would suggest that we do something similar for LMS as we do for DPM++ 2M with the lower_order_final
thing - i.e. if we are approaching the last few timesteps, we reduce the "derivative depth" so to speak.
I am a huge noob here so just thinking out loud and learning 😄 🙏
Here is a simple change which forces order = 1 for the last 15 timesteps https://github.com/playgroundai/diffusers/commit/c3a629155853591953b3830c1d87468b50956ccb
This is a proof-of-concept change and I don't know, for example, whether the order should "gradually" drop off instead of going from 4 -> 1 when there are 15 steps left.
It eliminates the noise for various values of strength
when num_inference_steps=50
:
For fun, here's another approach where we always ensure to start with order=1 for the first step performed (even if the first step index > 0) and also drop off to order=1 for the final timesteps: https://github.com/playgroundai/diffusers/commit/6542f63b713db83a84a7ebe954267c52aecac3d6
Results alongside the previous approach (I think this approach yields slightly more detail):
Oh yikes, I just saw that in the screenshots above, the strength values are wrong. The values in the screenshots are 1 - strength
. So e.g. the images labeled 0.9 were actually generated with strength=0.1
.
cc @yiyixuxu
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@yiyixuxu is this solved here?
Is this related to https://github.com/huggingface/diffusers/pull/6187 ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
When using the LMS scheduler with SDXL Img2Img pipeline, there is a lot of noise leftover in the image especially when
strength
is closer to0
. In other words, when the total number of performed steps is "low" (e.g.num_inference_steps=50
andstrength=0.1
), the result images are unusably noisy.Reproduction
Here's some code that first does a prompt-to-image generation, and then an image-to-image from that result with
strength =0.1
. The image-to-image result looks like an intermediate latent. Note that the prompt-to-image result looks completely fine. This is reproducible with any input image - I just used a p2i gen because it felt easier to share here.Image-to-Image Result:
Logs
No response
System Info
diffusers
version: 0.21.4Who can help?
@yiyixuxu @patrickvonplaten