huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.83k stars 5.32k forks source link

StableDiffusionXL RuntimeError with ControlNet and Refiner #7732

Closed Taone974 closed 6 months ago

Taone974 commented 6 months ago

Hi, I'm trying to use a controlNet in combination with a Refiner but I'm not able to get it working. I'm having this error: pipeline_stable_diffusion_xl_img2img.py line 496: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor.

The canny_image I'm using is a png image already generated from a 3D software. It works without the refiner.

Here is the code:

import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL, DiffusionPipeline

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True
)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16, 
    use_safetensors=True
)    
base = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    controlnet = controlnet,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
base.to("cuda")

refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder=base.text_encoder,
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

prompt = "A robot"
image_canny = Image.open("path_to_canny_image.png").convert("RGB")

image = base(
    prompt=prompt,
    controlnet_conditioning_scale=0.5,
    image=image_canny,
    num_inference_steps=20,
    denoising_end=0.8,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    image=image,
    num_inference_steps=20,
    denoising_start=0.8,

).images[0]

I'm pretty new with this kind of stuff, so probably did something stupid... Any help would be appreciated.

tolgacangoz commented 6 months ago

Could you just remove text_encoder=base.text_encoder, as done in the documentation, and try again?

Taone974 commented 6 months ago

Thank you so much!