[bug]: Tiled decoding ruins the image

seinan9 commented 3 months ago

Is there an existing issue for this problem?

[X] I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

RTX 4060 TI

GPU VRAM

16GB

Version number

4.0.2

Browser

Google Chrome 123.0.6312.105

Python dependencies

{ "accelerate": "0.28.0", "compel": "2.0.2", "cuda": "12.1", "diffusers": "0.27.2", "numpy": "1.26.4", "opencv": "4.9.0.80", "onnx": "1.15.0", "pillow": "10.3.0", "python": "3.10.6", "torch": "2.2.1+cu121", "torchvision": "0.17.1+cu121", "transformers": "4.39.1", "xformers": "0.0.25" }

What happened

The image quality is significantly worse when setting force_tiled_decode to true. This is slightly noticable during the initial generation, and much more when upscaling. Certain parts look oversaturated. One can observe clear difference in the images bellow (first with tiled decode off, second with tiled decode on). Also on the second image, one can see that the upper part (probably the first tile) is less affected than the bottom part (probably the second tile).

tiled_decode_off tiled_decode_on

What you expected to happen

I expected the images to be decoded normally without oversaturation on certain parts.

How to reproduce the problem

Set force_tiled_decode option in invokeai.yaml to true, start the application and generate an image (easier to spot with realistic ones and upscaling).

Additional context

The examples are generated with epicrealism (SD 1.5), but this is reproducible with other models as well. In realistic ones it is easier to spot. This does not happen in InvokeAI version 3.7.

The initial image was generated with the following parameters:

Generation Mode: txt2img
Positive Prompt: photo of a viking, short hair, oversized sweater, close up, fierce, male
Negative Prompt: (low quality)1.4
Model: epicrealism (SD-1)
Width: 512
Height: 768
Seed: 2926731161
Steps: 25
Scheduler: dpmpp_2m_k
CFG scale: 8
CFG Rescale Multiplier: 0

Afterwards it was upscaled to 640x960 via img2img with a denoise of .55. The parameters stayed the same.

Discord username

seinan9

psychedelicious commented 3 months ago

The handling of tiled decode hasn't changed in some time - several months. This functionality is handled wholly by diffusers, and it appears their implementation also hasn't changed in months.

It's possible there was some change in another area of diffusers or invoke that indirectly affect how tiled decoding is handled.

However, slight changes like this are known effects of tiled decoding. The model doesn't have the full context of the image, it's expected that the tiled decode has measurable and sometimes visual differences. There's some discussion here, though the example images appear to be missing now.

A more convincing comparison would be between a v3.7.0 image with tiled decode vs v4 image with tiled decode (no upscaling please, that adds another variable to the equation). Ideally a few comparisons.

seinan9 commented 3 months ago

It is less visible during the first pass, since there is typically only a single tile (2 at most if the resolution is set a bit higher). Still here are two more example without upscaling (512x768). Images 1 and 3 were generated via InvokeAI 3.7, while 2 and 4 were generated using InvokeAI 4.0.2. Same parameters for alle images.

invoke37_tiled_decode_on_0 invoke402_tiled_decode_on_0 invoke37_tiled_decode_on_1 invoke402_tiled_decode_on_1

psychedelicious commented 3 months ago

Thanks for those examples. It's still very noticeable. I think we need to test this with diffusers (i.e. via separate script, not within invoke) to confirm where the problem is.

seinan9 commented 3 months ago

Your welcome. And thank you for looking into it!

RyanJDick commented 1 month ago

I tried to reproduce this today. It turns out that there was no regression in VAE tiling behavior. There was a period of time during the switch from tiled_decode to force_tiled_decode during which we weren't applying the force_tiled_decode config.

For example, look at the v3.6.2 tag:

force_tiled_decode was present in the config and tiled_decode was deprecated: https://github.com/invoke-ai/InvokeAI/blob/v3.6.2/invokeai/app/services/config/config_default.py#L272
But, we were still using tiled_decode in the codebase: https://github.com/invoke-ai/InvokeAI/blob/v3.6.2/invokeai/app/invocations/latent.py#L857

This was eventually fixed in https://github.com/invoke-ai/InvokeAI/commit/897fe497dc70012cdd2680ca9a297f35545f7817.

I tested VAE tiling in older versions of Invoke via workflows and saw the same bad VAE tiling artifacts as in the latest version of Invoke. Unfortunately, these tiling artifacts are expected in the current diffusers implementation of VAE tiling, as discussed on the original PR: https://github.com/huggingface/diffusers/pull/1441

I'm going to do a little experimentation to see if I can improve things by modifying the tile dimensions/overlaps. But a proper fix would be a bigger project.

seinan9 commented 1 month ago

Unfortunate that it is a problem within the diffusers implementation. For me it is not an urget problem, but I am still grateful that you are looking into it. Thanks!

ufuksarp commented 1 month ago

I had the same issue with EasyDiffusion last year. They added a switch to disable VAE tiling. It seems that's the only fix right now. https://github.com/easydiffusion/easydiffusion/issues/1442

invoke-ai / InvokeAI