huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.94k stars 4.93k forks source link

[0.28.0.dev0] In ControlNetModel forward function, added_cond_kwargs arg's default value should be {} rather than None #8380

Closed CatLoves closed 1 month ago

CatLoves commented 1 month ago

Describe the bug

Version: [0.28.0.dev0] Bug description: In ControlNetModel forward function, added_cond_kwargs arg's default value should be {} rather than None Reason: in the source code(controlnet.py): elif self.config.addition_embed_type == "text_time": if "text_embeds" not in added_cond_kwargs: raise ValueError( f"{self.class} has the config param addition_embed_type set to 'text_time' which requires the keyword argument text_embeds to be passed in added_cond_kwargs" ) If the default value is None, this code would raise a ValueError Pull request: I intend to submit a pull request for this issue.

Reproduction

from diffusers import AutoPipelineForInpainting, StableDiffusionControlNetInpaintPipeline,ControlNetModel from diffusers.utils import load_image import torch

controlnet_canny = ControlNetModel.from_pretrained( "xinsir/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16 ) print(f"=> controlnet_canny is ready") pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained( "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", controlnet=controlnet_canny, torch_dtype=torch.float16, variant="fp16", ).to("cuda")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image = load_image(img_url).resize((1024, 1024)) mask_image = load_image(mask_url).resize((1024, 1024)) canny_image = get_canny_edge(image)

prompt = "a tiger sitting on a park bench" generator = torch.Generator(device="cuda").manual_seed(0)

generated_image = pipe( prompt=prompt, image=image, mask_image=mask_image, control_image=canny_image, guidance_scale=8.0, num_inference_steps=20, # steps between 15 and 30 work well for us strength=0.99, # make sure to use strength below 1.0 generator=generator, ).images[0]

Logs

No response

System Info

diffusers-cli env 2024-06-03 03:28:35.669077: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-03 03:28:35.758278: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-06-03 03:28:35.758367: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-06-03 03:28:35.765096: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-06-03 03:28:35.790633: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-06-03 03:28:37.281435: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Traceback (most recent call last): File "/training/home/project/stable-diffusion-webui/venv/bin/diffusers-cli", line 8, in sys.exit(main()) File "/training/home/project/stable-diffusion-webui/venv/lib/python3.9/site-packages/diffusers/commands/diffusers_cli.py", line 39, in main service.run() File "/training/home/project/stable-diffusion-webui/venv/lib/python3.9/site-packages/diffusers/commands/env.py", line 101, in run bitsandbytes_version = bitsandbytes.version AttributeError: module 'bitsandbytes' has no attribute 'version'

Who can help?

@sayakpaul @yiyixuxu

CatLoves commented 1 month ago

Update diffusers-cli env info:

sayakpaul commented 1 month ago

I don't think we need to default to {} here because it becomes effective only when we're using SDXL models. WDYT @yiyixuxu?

CatLoves commented 1 month ago

I don't think we need to default to {} here because it becomes effective only when we're using SDXL models. WDYT @yiyixuxu?

Yes, you are right. Thank you for quick response! But this naturally lead to one question: when the user use SDXL model, why does the code raise the ValueError exception? The minimal replication code snippet is the following:

from diffusers import AutoPipelineForInpainting, StableDiffusionControlNetInpaintPipeline,ControlNetModel from diffusers.utils import load_image import torch

""" build diffusion model pipeline """ controlnet_canny = ControlNetModel.from_pretrained( "xinsir/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16 ) print(f"=> controlnet_canny is ready") pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained( "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", controlnet=controlnet_canny, torch_dtype=torch.float16, variant="fp16", ).to("cuda")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image = load_image(img_url).resize((1024, 1024)) mask_image = load_image(mask_url).resize((1024, 1024)) canny_image = get_canny_edge(image)

prompt = "a tiger sitting on a park bench" generator = torch.Generator(device="cuda").manual_seed(0)

generated_image = pipe( prompt=prompt, image=image, mask_image=mask_image, control_image=canny_image, guidance_scale=8.0, num_inference_steps=20, # steps between 15 and 30 work well for us strength=0.99, # make sure to use strength below 1.0 generator=generator, ).images[0]

The above code would encounter the ValueError as added_cond_kwargs is None by default. So, currently, diffusers just doesn't support diffusers/stable-diffusion-xl-1.0-inpainting-0.1 + xinsir/controlnet-canny-sdxl-1.0 pipeline ?Or this is just a bug of diffusers ? BTW: The above code would be OK if I just use lllyasviel/control_v11p_sd15_inpaint + lllyasviel/control_v11p_sd15_canny pipeline.

sayakpaul commented 1 month ago

You are not using the right class. You need to use this: https://github.com/huggingface/diffusers/blob/413604405fddb4692a8e9a9a9fb6c353d22881ea/src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py#L154

CatLoves commented 1 month ago

You are not using the right class. You need to use this:

https://github.com/huggingface/diffusers/blob/413604405fddb4692a8e9a9a9fb6c353d22881ea/src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py#L154

Yes, you are right, after switching to StableDiffusionXLControlNetInpaintPipeline, the code just runs fine. Thanks again for your kind and quick response! I would close this issue now.