Closed syyxsxx closed 1 year ago
Hey @syyxsxx,
Can you please attach a reproducible code snippet?
from diffusers import ControlNetModel, StableDiffusionXLControlNetImg2ImgPipeline, AutoencoderKL
import numpy as np
import torch
import cv2
from PIL import Image
prompt = 'listen, anime style artwork, cartoon, makoto shinkai and lois van baarle, unreal engine, loish, rhads, beeple, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, soft colors, muted colors, detailed and intricate environment'
image_path = './000.png'
image = Image.open(image_path)
image.convert("RGB")
np_image = np.array(image)
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
variant="fp16",
use_safetensors=True,
torch_dtype=torch.float16,
).to("cuda")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
pipe = StableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
variant="fp16",
use_safetensors=True,
vae=vae,
torch_dtype=torch.float16,
).to("cuda")
pipe.enable_model_cpu_offload()
controlnet_conditioning_scale = 0.5
images = pipe(
prompt,
image=image,
control_image=canny_image,
strength=0.8,
num_inference_steps=50,
controlnet_conditioning_scale=controlnet_conditioning_scale,
).images
images[0].save("sdxl_canny_0.8_0.5_anime_style.png")
@patrickvonplaten
thanks in advance
also i find if i use stabilityai/stable-diffusion-xl-refiner-1.0 model i will get the error:
File "/workspace/i2v/test_xl.py", line 38, in <module>
images = pipe(
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 1223, in __call__
add_time_ids, add_neg_time_ids = self._get_add_time_ids(
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 815, in _get_add_time_ids
add_time_ids = list(original_size + crops_coords_top_left + (aesthetic_score,))
TypeError: torch.Size() takes an iterable of 'int' (item 4 is 'float')
if i change aesthetic_score to int,i will get another erorr:
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py", line 1279, in __call__
down_block_res_samples, mid_block_res_sample = self.controlnet(
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/models/controlnet.py", line 763, in forward
aug_emb = self.add_embedding(add_embeds)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward
sample = self.linear_1(sample)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/anaconda3/envs/t2v/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2560 and 2816x1280)
The aesthetic_score
should only be used with: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0 now with https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
Hey @syyxsxx,
Regarding SDXL being less able to produce anime images, I'm not super surprised as SDXL requires a different way of prompting compared to SD1 or SD2. I would also suggest to play around with checkpoints that have been fine-tuned on Anime:
@patrickvonplaten Thank you for your reply Not just anime style,Almost non lora style has this problem. Do you have any suggestions for SDxl prompt engineering Besides, I tried sdxl + img2img + canny(t2i-adapter) , I seem work well
it seem work well if i set guidance_scale to 15~20
it seems sdxlimg2imgcontrolnet pipeline does not work very well in some promt style for example anime style prompt:listen, anime style artwork, cartoon, makoto shinkai and lois van baarle, unreal engine, loish, rhads, beeple, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, soft colors, muted colors, detailed and intricate environment input: sdxl: sd1.5 There are also many other styles with this issue