FluxInpaintPipeline overrides pixels outside the mask

Clement-Lelievre commented 2 weeks ago

Describe the bug

When inpainting (with diffusers==0.31.0 and torch==2.4.1) using FluxInpaintPipeline, I get some pixels outside the mask (and pretty far away from the mask border) that are overrided whereas the mask at their indices was black. With either flux schnell or dev.

Is this expected?

I could paste the output image on the input image using the mask to get rid of this, but I'd like to check first if this is a bug or not.

Here is my input image and mask:

vampire_input_rgb vampire_mask

Here is the kind of diff I get:

diff

Reproduction

I am using a small, square (224*224) image (see above), and a non-rectangular mask that has values only in {0, 255}.

from diffusers import FluxInpaintPipeline

pipe = FluxInpaintPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16, device='cuda')
prompt = "skull"
image= # above, convert it to RGB
mask = # above,  convert it to RGB
image = pipe(prompt=prompt, image=source, mask_image=mask, height=224, width=224, num_inference_steps=2, guidance_scale=3.5, strength=0.76).images[0]  # the latter params are probably not necessary to reproduce the issue
image.save("flux_inpainting.png")

# now visualize the pixel diff between the output you get and the input image, you should see non-zero pixels outside the mask, particularly on the right part

Logs

No response

System Info

🤗 Diffusers version: 0.31.0.dev0 or 0.31.0
Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
Running on Google Colab?: No
Python version: 3.12.0
PyTorch version (GPU?): 2.4.1+cu121 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.25.0
Transformers version: 4.44.2
Accelerate version: 0.34.2
PEFT version: 0.12.0
Bitsandbytes version: not installed
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA A10G, 23028 MiB

Who can help?

@asomoza @sayakpaul

sayakpaul commented 2 weeks ago

PyTorch 3.2.0? That sounds like invalid.

Clement-Lelievre commented 2 weeks ago

PyTorch 3.2.0? That sounds like invalid.

torch==2.4.1

yiyixuxu commented 2 weeks ago

cc @asomoza

christopher5106 commented 2 weeks ago

I believe it's due to VAE encode/decode. It's not a bijective method. If confirmed, a blend with the decoded image at the end will solve the difference.

SahilCarterr commented 2 weeks ago

That can be done but it would require changing Pipelines including all ControlNets and Inpainting pipelines because they all have the same difference.

I believe it's due to VAE encode/decode. It's not a bijective method. If confirmed, a blend with the decoded image at the end will solve the difference.

crapthings commented 2 weeks ago

I believe it's due to VAE encode/decode. It's not a bijective method. If confirmed, a blend with the decoded image at the end will solve the difference.

@christopher5106 can this be apply to controlnet inpainting? is this method same as apply_overlay?

what is different?

https://huggingface.co/docs/diffusers/api/image_processor#diffusers.image_processor.VaeImageProcessor.apply_overlay

https://huggingface.co/docs/diffusers/api/image_processor#diffusers.image_processor.VaeImageProcessor.postprocess

christopher5106 commented 2 weeks ago

Right, you can use apply_overlay on the final PIL image in both inpaint and controlnet inpaint pipelines to correct the small VAE differences. But I would apply the mask directly on the image tensor before postprocess just after decode and probably not on PIL object

christopher5106 commented 2 weeks ago

            ...
            image = self.vae.decode(latents, return_dict=False)[0]
            image = (1 - init_mask) * init_image + init_mask * image
            image = self.image_processor.postprocess(image, output_type=output_type)
            ...

crapthings commented 2 weeks ago

            ...
            image = self.vae.decode(latents, return_dict=False)[0]
            image = (1 - init_mask) * init_image + init_mask * image
            image = self.image_processor.postprocess(image, output_type=output_type)
            ...

What are the advantages of doing it this way? Compared to directly applying overlay? Also, can this be applied to control networks?

i'm trying to apply this into controlnet by only copy this line

image = self.vae.decode(latents, return_dict=False)[0]

to pag controlnet pipeline, but i got a grayish result

https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd.py#L1317

putdanil commented 2 weeks ago

I have issues too, parts of the original image (black parts) show up on the inpainted image.

Removing this line "solves" this, but seems to damage the diffusion process latents = (1 - init_mask) * init_latents_proper + init_mask * latents

fluxtest011124 (19)

huggingface / diffusers