pkuliyi2015 / multidiffusion-upscaler-for-automatic1111

Tiled Diffusion and VAE optimize, licensed under CC BY-NC-SA 4.0
Other
4.77k stars 334 forks source link

Noise Inversion generate blurry image with SD2.1 models #159

Open SLAPaper opened 1 year ago

SLAPaper commented 1 year ago

Original Pic (768*768): 00000-1735483949 (自定义)

2x scale up with original SD 1.5 & Noise Inversion: 00007-1

wizard, close up shot, old man, white hair, magic, cinematic, painting
Negative prompt: low quality, messy, distortion, mutated, extra fingers, extra limbs
Steps: 25, Sampler: Euler, CFG scale: 7.5, Seed: 1, Size: 1536x1536, Model hash: 6e8859ee58, Model: original_v1-5-pruned-emaonly-4026-0000-0000, Denoising strength: 0.4, ENSD: 31337, Tiled Diffusion upscaler: 4x-UltraMix_Balanced, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 96, 'Latent tile height': 96, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraMix_Balanced', 'Scale factor': 2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 10, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64}"

2x scale up with original SD 2.1-768 & Noise Inversion: 00008-1

wizard, close up shot, old man, white hair, magic, cinematic, painting
Negative prompt: low quality, messy, distortion, mutated, extra fingers, extra limbs
Steps: 25, Sampler: Euler, CFG scale: 7.5, Seed: 1, Size: 1536x1536, Model hash: 9d6f154629, Model: original_v2-1_768-ema-pruned-5095-0869-1141, Denoising strength: 0.4, ENSD: 31337, Tiled Diffusion upscaler: 4x-UltraMix_Balanced, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 96, 'Latent tile height': 96, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraMix_Balanced', 'Scale factor': 2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 10, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64}"

2x scale up with original SD 2.1-768 & NO Noise Inversion 00009-1

wizard, close up shot, old man, white hair, magic, cinematic, painting
Negative prompt: low quality, messy, distortion, mutated, extra fingers, extra limbs
Steps: 25, Sampler: DPM++ 2M alt Karras, CFG scale: 7.5, Seed: 1, Size: 1536x1536, Model hash: 9d6f154629, Model: original_v2-1_768-ema-pruned-5095-0869-1141, Denoising strength: 0.4, ENSD: 31337, Tiled Diffusion upscaler: 4x-UltraMix_Balanced, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 96, 'Latent tile height': 96, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraMix_Balanced', 'Scale factor': 2, 'Keep input size': True}"
pkuliyi2015 commented 1 year ago

To avoid blurry images, please increase the noise inversion steps

SLAPaper commented 1 year ago

Here is result using noise inversion steps 100, which dosn't change much:

00013-1

wizard, close up shot, old man, white hair, magic, cinematic, painting
Negative prompt: low quality, messy, distortion, mutated, extra fingers, extra limbs
Steps: 25, Sampler: Euler, CFG scale: 7.5, Seed: 1, Size: 1536x1536, Model hash: 9d6f154629, Model: original_v2-1_768-ema-pruned-5095-0869-1141, Denoising strength: 0.4, ENSD: 31337, Tiled Diffusion upscaler: 4x-UltraMix_Balanced, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'MultiDiffusion', 'Latent tile width': 96, 'Latent tile height': 96, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraMix_Balanced', 'Scale factor': 2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 100, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64}"
prometixX commented 1 year ago

I also got blurry images, and I also used SD2.1 see #127

Kahsolt commented 1 year ago

I could not tell if actually this feature cannot work with SD2.1 due to certain compatible problems. Our code is not tested under that pipeline, probably does not plan to stretch further :(

dill-shower commented 1 year ago

Same problem with wd1.5 model based on 2.1-768v SD. If it can helps, I can provide images and parameters. I tried to change the number of noise Inversion steps up to 400, denoise strength and everything that can be changed. Noise Inversion makes the image smoother and destroys details.

pkuliyi2015 commented 1 year ago

Yes, noise inversion tends to destroy details for realistic or semi-realistic images. It's a known issue.

But you can combine it with the latest ControlNet v1.1 Tile model, which tends to produce overly sufficient details. This combo can eliminate the drawbacks of the both.

For SD 2.1 models, I don't know what the problem is but I think it is from your checkpoint. If the checkpoint is trained on something very different from your images, it can lead to severe blur. This is because noise inversion is finding an approximate X_T in your checkpoints. If there is no proper X_T, it produces unknown noise that can't be properly denoised.

Kahsolt commented 1 year ago

These reports seem all from SD 2.1, cannot tell if noise inversion is not compatible with it from some logical view. People who encounter this issue are recommended to try reproducing it using SD v1.x compatible models, through the same settings & pipeline (because we main developers do not use SD 2.1, cannot test those :(

And @pkuliyi2015 also puts right, noise inversion will find the proper initial noise for this model weights. If the art style for your reference image is far apart from your model weights that are trained with, noise inversion will give a bad approximation.