question about sdxl+img2img+controlnet pipeline

syyxsxx commented 1 year ago

it seems sdxlimg2imgcontrolnet pipeline does not work very well in some promt style for example anime style prompt：listen, anime style artwork, cartoon, makoto shinkai and lois van baarle, unreal engine, loish, rhads, beeple, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, soft colors, muted colors, detailed and intricate environment input： 00000 (1) sdxl： sdxl_canny_0 8_0 5_anime_style (2) sd1.5 sd_canny_0 8_0 5_anime_style (1) There are also many other styles with this issue

patrickvonplaten commented 1 year ago

Hey @syyxsxx,

Can you please attach a reproducible code snippet?

syyxsxx commented 1 year ago

from diffusers import ControlNetModel, StableDiffusionXLControlNetImg2ImgPipeline, AutoencoderKL
import numpy as np
import torch
import cv2

from PIL import Image

prompt = 'listen, anime style artwork, cartoon, makoto shinkai and lois van baarle, unreal engine, loish, rhads, beeple, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, soft colors, muted colors, detailed and intricate environment'

image_path = './000.png'
image = Image.open(image_path)
image.convert("RGB")

np_image = np.array(image)
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)

controlnet = ControlNetModel.from_pretrained(
     "diffusers/controlnet-canny-sdxl-1.0",
     variant="fp16",
     use_safetensors=True,
     torch_dtype=torch.float16,
).to("cuda")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
pipe = StableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
     controlnet=controlnet,
     variant="fp16",
     use_safetensors=True,
     vae=vae,
     torch_dtype=torch.float16,
).to("cuda")
pipe.enable_model_cpu_offload()
controlnet_conditioning_scale = 0.5
images = pipe(
     prompt,
     image=image,
     control_image=canny_image,
     strength=0.8,
     num_inference_steps=50,
     controlnet_conditioning_scale=controlnet_conditioning_scale,
).images
images[0].save("sdxl_canny_0.8_0.5_anime_style.png")

@patrickvonplaten

thanks in advance

syyxsxx commented 1 year ago

also i find if i use stabilityai/stable-diffusion-xl-refiner-1.0 model i will get the error：

  File "/workspace/i2v/test_xl.py", line 38, in <module>
    images = pipe(
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 1223, in __call__
    add_time_ids, add_neg_time_ids = self._get_add_time_ids(
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 815, in _get_add_time_ids
    add_time_ids = list(original_size + crops_coords_top_left + (aesthetic_score,))
TypeError: torch.Size() takes an iterable of 'int' (item 4 is 'float')

if i change aesthetic_score to int，i will get another erorr:

  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 1279, in __call__
    down_block_res_samples, mid_block_res_sample = self.controlnet(
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/models/controlnet.py", line 763, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2560 and 2816x1280)

patrickvonplaten commented 1 year ago

The aesthetic_score should only be used with: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0 now with https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

patrickvonplaten commented 1 year ago

Hey @syyxsxx,

Regarding SDXL being less able to produce anime images, I'm not super surprised as SDXL requires a different way of prompting compared to SD1 or SD2. I would also suggest to play around with checkpoints that have been fine-tuned on Anime:

syyxsxx commented 1 year ago

@patrickvonplaten Thank you for your reply Not just anime style，Almost non lora style has this problem. Do you have any suggestions for SDxl prompt engineering Besides, I tried sdxl + img2img + canny(t2i-adapter) , I seem work well

syyxsxx commented 1 year ago

it seem work well if i set guidance_scale to 15~20

huggingface / diffusers

question about sdxl+img2img+controlnet pipeline #5357