huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.41k stars 5.43k forks source link

IP adapter output on some resolutions suffers in quality? #9136

Open darshats opened 3 months ago

darshats commented 3 months ago

Describe the bug

I am running IP adapter for 768x1344 which is one of the sdxl listed resolutions. I find that the output quality is much less than say regular 768x768 generations. I've attached sample images and code below. In this experiment 1080x768 seemed to get best output, but its not one of the supported resolutions @asomo

fridge_fg fridge_bg fridge_canny fridge_mask fridge_inv_mask

Reproduction

import torch from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline, ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL, UniPCMultistepScheduler from diffusers.image_processor import IPAdapterMaskProcessor from transformers import CLIPVisionModelWithProjection from controlnet_aux import AnylineDetector import cv2 import numpy as np from PIL import Image, ImageOps from huggingface_hub import hf_hub_download

def create_controlnet_pipes(image_encoder=None)->StableDiffusionXLControlNetPipeline:

get controlnet

controlnet = ControlNetModel.from_pretrained(
            "diffusers/controlnet-canny-sdxl-1.0",
            torch_dtype=torch.float16,
            use_safetensors=True,
        )
pipe = StableDiffusionXLPipeline.from_single_file(
            "sdxl model path", 
            add_watermarker=False, 
            torch_dtype=torch.float16, 
            variant="fp16", 
            use_safetensors=True,
            image_encoder=image_encoder,
            )
pipe = StableDiffusionXLControlNetPipeline(
        controlnet=controlnet,
        **pipe.components,
        add_watermarker=False,
    )
pipe = pipe.to("cuda")
return pipe

def canny(image): image = np.array(image) low_threshold = 100 high_threshold = 200 image = cv2.Canny(image, low_threshold, high_threshold) image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) return Image.fromarray(image)

if name == 'main':

crop different values like 0,0,1080,768 or 0,0,1280,768

ref_image = Image.open('images/fridge_fg.png').crop((0,0,1344,768))
bg_ref_image = Image.open('images/fridge_bg.png').crop((0,0,1344,768))

mask_new = Image.open('images/fridge_mask.png').convert('L').crop((0,0,1344,768))
inv_mask = Image.open('images/fridge_inv_mask.png').convert('L').crop((0,0,1344,768))
processor = IPAdapterMaskProcessor()
mask_fg = processor.preprocess([mask_new])
mask_fg = mask_fg.reshape(1, mask_fg.shape[0], mask_fg.shape[2], mask_fg.shape[3])

mask_bg = processor.preprocess([inv_mask])
mask_bg = mask_bg.reshape(1, mask_bg.shape[0], mask_bg.shape[2], mask_bg.shape[3])

canny_pil = Image.open('images/fridge_canny.png').crop((0,0,1344,768))

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter",
    subfolder="models/image_encoder",
    torch_dtype=torch.float16
)
pipe = create_controlnet_pipes(image_encoder=image_encoder)
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus_sdxl_vit-h.safetensors"], use_safetensors=True)
scale_config_fg = {'down':1, 'mid':1, 'up':1}
scale_config_bg = {"down":0.7, 'mid':0.7, 'up':0.7}
pipe.set_ip_adapter_scale([scale_config_fg, scale_config_bg])

for idx in range(5):
    outputs = pipe(
        prompt='kitchen scene',
        image=canny_pil,
        ip_adapter_image=[ref_image, bg_ref_image],
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality, fuzzy, blurry",
        guidance_scale=5,
        num_inference_steps=30,
        controlnet_conditioning_scale=0.53,
        cross_attention_kwargs={"ip_adapter_masks": [mask_fg, mask_bg]},
        num_images_per_prompt=1
        # generator=generator,
    ).images
    for image in outputs:
        image.save(<path>)
        # image.save(f'output_plus/fridge_ar_ctrlnet_1280_plus_{idx}.png')
    print('done')
pipe.unload_ip_adapter()

Logs

No response

System Info

v0.28.2 diffusers

Who can help?

No response

dibbla commented 3 months ago

For what I observed in past experiences, SDXL can work with many different resolutions, and its performance does vary from one resolution to another when working with IP-Adapter. You don't have to stick to (listed) supported resolutions.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.