huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.43k stars 5.44k forks source link

TypeError: For multiple controlnets: `image` must be type `list` #4110

Closed elcolie closed 1 year ago

elcolie commented 1 year ago

Describe the bug

StableDiffusionControlNetInpaintPipeline
raises TypeError: For multiple controlnets: image must be type `list``

Reproduction

import itertools
import random

import cv2
# !pip install transformers accelerate
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, DDIMScheduler
from diffusers.utils import load_image
import numpy as np
import torch
from tqdm import tqdm

from set_seed import seed_everything
from PIL import Image

out_dir: str = "sasithorn_bikini_beach"
device: str = "mps" if torch.backends.mps.is_available() else "cpu"
print(device)

init_image = load_image("sources/sasithorn.jpeg")
# init_image = init_image.resize((512, 512))

seed: int = 8811
seed_everything(seed)

generator = torch.Generator(device=device).manual_seed(seed)

mask_cloth_image = load_image(
    "sources/masked_cloth_sasithorn.png"
)
mask_background_image = load_image(
    "sources/masked_background_sasithorn.png"
)

def return_array_image(image):
    canny_image = np.array(image)

    low_threshold = 100
    high_threshold = 200

    canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)

    # zero out middle columns of image where pose will be overlayed
    zero_start = canny_image.shape[1] // 4
    zero_end = zero_start + canny_image.shape[1] // 2
    canny_image[:, zero_start:zero_end] = 0

    canny_image = canny_image[:, :, None]
    canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2)
    canny_image = Image.fromarray(canny_image)
    return canny_image

# mask_image = mask_image.resize((512, 512))

def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

base_prompt = "4k, ultra resolution, sexy, white skin, straight face, sit cross legged, blue sky"
additional_prompts = ["swimsuit", "bikini"]
negative_prompt: str = "low resolution, blur, bad quality, distortion, bad shape, skinny, turn back, bad face, distorted face, ugly face, people, limbs"
strengths = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
guidance_scales = [0, 0.2, 0.4, 0.6, 0.8, 1.0, 2, 4, 6, 8, 10]
eta_list = [0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.5, 2.0, 4, 6, 8, 10]
combined_list = list(itertools.product(
    strengths, guidance_scales, eta_list, additional_prompts)
)

# Shuffle the combined list
random.shuffle(combined_list)

controlnets = [
    ControlNetModel.from_pretrained(
        "lllyasviel/control_v11p_sd15_inpaint",
    ).to(device),
    ControlNetModel.from_pretrained(
        "lllyasviel/control_v11p_sd15_inpaint",
    ).to(device),
]
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnets,
    requires_safety_checker=False,
    safety_checker=None
).to(device)
pipe.requires_safety_check = False
pipe.safety_checker = None
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

my_images = [
    make_inpaint_condition(init_image, mask_cloth_image),
    make_inpaint_condition(init_image, mask_background_image)
]

for item in tqdm(combined_list, total=len(combined_list)):
    strength, guidance_scale, eta, add_prompt = item
    # generate image
    image = pipe(
        f"{base_prompt}, {add_prompt}",
        image=my_images,
        negative_prompt=negative_prompt,
        num_inference_steps=50,
        generator=generator,
        eta=eta,
        strength=strength,
        guidance_scale=guidance_scale,
        controlnet_conditioning_scale=[1.0, 0.8]
    ).images[0].save(f"{out_dir}/{strength}_{guidance_scale}_{eta}_{add_prompt}.png")

Logs

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /Users/sarit/study/try_openai/try_civitai/sasithorn_controlnet.py:124 in <module>                │
│                                                                                                  │
│   121 for item in tqdm(combined_list, total=len(combined_list)):                                 │
│   122 │   strength, guidance_scale, eta, add_prompt = item                                       │
│   123 │   # generate image                                                                       │
│ ❱ 124 │   image = pipe(                                                                          │
│   125 │   │   f"{base_prompt}, {add_prompt}",                                                    │
│   126 │   │   image=my_images,                                                                   │
│   127 │   │   negative_prompt=negative_prompt,                                                   │
│ /Users/sarit/anaconda3/envs/try_openai/lib/python3.11/site-packages/torch/utils/_contextlib.py:1 │
│ 15 in decorate_context                                                                           │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /Users/sarit/anaconda3/envs/try_openai/lib/python3.11/site-packages/diffusers/pipelines/controln │
│ et/pipeline_controlnet_inpaint.py:1132 in __call__                                               │
│                                                                                                  │
│   1129 │   │   │   ]                                                                             │
│   1130 │   │                                                                                     │
│   1131 │   │   # 1. Check inputs. Raise error if not correct                                     │
│ ❱ 1132 │   │   self.check_inputs(                                                                │
│   1133 │   │   │   prompt,                                                                       │
│   1134 │   │   │   control_image,                                                                │
│   1135 │   │   │   height,                                                                       │
│                                                                                                  │
│ /Users/sarit/anaconda3/envs/try_openai/lib/python3.11/site-packages/diffusers/pipelines/controln │
│ et/pipeline_controlnet_inpaint.py:714 in check_inputs                                            │
│                                                                                                  │
│    711 │   │   │   and isinstance(self.controlnet._orig_mod, MultiControlNetModel)               │
│    712 │   │   ):                                                                                │
│    713 │   │   │   if not isinstance(image, list):                                               │
│ ❱  714 │   │   │   │   raise TypeError("For multiple controlnets: `image` must be type `list`")  │
│    715 │   │   │                                                                                 │
│    716 │   │   │   # When `image` is a nested list:                                              │
│    717 │   │   │   # (e.g. [[canny_image_1, pose_image_1], [canny_image_2, pose_image_2]])       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: For multiple controlnets: `image` must be type `list`

System Info

OSX: 13.4.1(c) RAM: 96GB absl-py==1.4.0 accelerate==0.19.0 aiofiles==23.1.0 aiohttp==3.8.4 aiosignal==1.3.1 altair==5.0.0 ansiwrap==0.8.4 antlr4-python3-runtime==4.9.3 anyio==3.6.2 appnope==0.1.3 asttokens==2.2.1 async-generator==1.10 async-timeout==4.0.2 attrs==23.1.0 autopep8==2.0.2 backcall==0.2.0 backoff==2.2.1 beautifulsoup4==4.12.2 bleach==6.0.0 bson==0.5.10 build==0.10.0 certifi==2023.5.7 cffi==1.15.1 chardet==3.0.4 charset-normalizer==3.1.0 click==8.1.3 cmake==3.26.4 colorama==0.4.6 comm==0.1.3 commonmark==0.9.1 contourpy==1.0.7 controlnet-aux==0.0.5 cycler==0.11.0 datasets==2.12.0 debuglater==1.4.4 debugpy==1.6.7 decorator==5.1.1 defusedxml==0.7.1 diffusers==0.18.1 dill==0.3.6 einops==0.6.1 entrypoints==0.4 evaluate==0.4.0 exceptiongroup==1.1.1 executing==1.2.0 fastapi==0.95.2 fastjsonschema==2.17.1 ffmpy==0.3.0 filelock==3.12.0 flatbuffers==23.5.26 fonttools==4.39.4 frozenlist==1.3.3 fsspec==2023.5.0 gradio==3.32.0 gradio_client==0.2.5 h11==0.14.0 h5py==3.9.0 httpcore==0.17.2 httpx==0.24.1 huggingface-hub==0.14.1 humanize==4.6.0 idna==3.4 imageio==2.31.0 importlib-metadata==6.6.0 invisible-watermark==0.2.0 ipdb==0.13.13 ipykernel==6.23.1 ipython==8.13.2 jedi==0.18.2 Jinja2==3.1.2 jprq==2.1.0 jsonschema==4.17.3 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyterlab-pygments==0.2.2 jupytext==1.14.5 kiwisolver==1.4.4 lazy_loader==0.2 linkify-it-py==2.0.2 markdown-it-py==2.2.0 MarkupSafe==2.1.2 matplotlib==3.7.1 matplotlib-inline==0.1.6 mdit-py-plugins==0.3.3 mdurl==0.1.2 mediapipe==0.10.1 mistune==2.0.5 monotonic==1.6 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 nbclient==0.7.4 nbconvert==7.4.0 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.1 numpy==1.24.3 omegaconf==2.3.0 openai==0.27.7 opencv-contrib-python==4.7.0.72 opencv-python==4.7.0.72 orjson==3.8.13 outcome==1.2.0 packaging==23.1 pandas==2.0.1 pandocfilters==1.5.0 papermill==2.4.0 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 pip-tools==6.13.0 platformdirs==3.5.1 ploomber==0.22.3 ploomber-core==0.2.10 ploomber-engine==0.0.28 ploomber-scaffold==0.3.1 posthog==3.0.1 prompt-toolkit==3.0.38 protobuf==3.20.3 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==12.0.0 pycodestyle==2.10.0 pycparser==2.21 pydantic==1.10.7 pydub==0.25.1 pyflakes==3.0.1 Pygments==2.15.1 pyparsing==3.0.9 pypdf==3.9.0 pyproject_hooks==1.0.0 pyre-extensions==0.0.29 pyrsistent==0.19.3 PySocks==1.7.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-multipart==0.0.6 pytz==2023.3 PyWavelets==1.4.1 PyYAML==6.0 pyzmq==25.0.2 regex==2023.5.5 requests==2.30.0 responses==0.18.0 rich==10.14.0 safetensors==0.3.1 scikit-image==0.21.0 scipy==1.10.1 selenium==4.10.0 semantic-version==2.10.0 simple-photo-gallery @ file:///Users/sarit/study/simple-photo-gallery six==1.16.0 sniffio==1.3.0 sortedcontainers==2.4.0 sounddevice==0.4.6 soupsieve==2.4.1 SQLAlchemy==2.0.15 sqlparse==0.4.4 stack-data==0.6.2 starlette==0.27.0 super-image==0.1.7 sympy==1.12 tabulate==0.9.0 tenacity==8.2.2 textwrap3==0.9.2 tifffile==2023.4.12 tiktoken==0.4.0 timm==0.9.2 tinycss2==1.2.1 tokenizers==0.13.3 toml==0.10.2 tomli==2.0.1 toolz==0.12.0 torch==2.0.1 torchvision==0.15.2 tornado==6.3.2 tqdm==4.65.0 traitlets==5.9.0 transformers==4.29.2 trio==0.22.0 trio-websocket==0.10.3 triton-pre-mlir @ git+https://github.com/vchiley/triton.git@2dd3b957698a39bbca615c02a447a98482c144a3#subdirectory=python typing-inspect==0.9.0 typing_extensions==4.5.0 tzdata==2023.3 uc-micro-py==1.0.2 urllib3==2.0.2 uvicorn==0.22.0 wcwidth==0.2.6 webencodings==0.5.1 websockets==11.0.3 wsproto==1.2.0 xformers==0.0.20 xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0 Python 3.11.3 CPU: M2

Who can help?

@patrickvonplaten @stevhliu @pcuenca

patrickvonplaten commented 1 year ago

@yiyixuxu could you have a look here maybe? :-)

yiyixuxu commented 1 year ago

@elcolie Hi: Thanks for reporting this issue!

StableDiffusionControlNetInpaintPipeline requires 3 image inputs: image, mask_image and control_image. I think what happened here is that you passed your control_image as image, and your control_image argument is left to be set to default None hence the type error

You will also need to pass your masks

image = pipe(
        f"{base_prompt}, {add_prompt}",
        image = [init_image] * 2,
        mask_image = [ mask_cloth_image,  mask_background_image],
        control_image=my_images,
        negative_prompt=negative_prompt,
        num_inference_steps=50,
        generator=generator,
        eta=eta,
        strength=strength,
        guidance_scale=guidance_scale,
        controlnet_conditioning_scale=[1.0, 0.8]
    ).images[0].save(f"{out_dir}/{strength}_{guidance_scale}_{eta}_{add_prompt}.png")

Let me know if this makes sense :)

YiYi

elcolie commented 1 year ago

Thank you @yiyixuxu It works. You need to update your article.

qnlbnsl commented 1 year ago

@yiyixuxu I think this might be an open issue for img2img pipeline. When i try passing in the image it gives me that error. Is there a sort of control image that is required for the StableDiffusionControlNetImg2ImgPipeline? i do not want to open another issue is this is a user error... Here is the debugger log:

Exception has occurred: TypeError
image must be passed and be one of PIL image, numpy array, torch tensor, list of PIL images, list of numpy arrays or list of torch tensors, but is <class 'NoneType'>
yiyixuxu commented 1 year ago

@qnlbnsl it seems like your image argument is empty