nbardy commented 1 year ago

Describe the bug

StableUnCLIPPipeline doesn't work with default values. It's missing the prior models. I'm trying to update them here and even converted a checkpoint model, but can't see to get it working yet.

Reproduction

!pip install git+https://github.com/huggingface/diffusers@main transformers accelerate scipy safetensors xformers

import requests import torch from PIL import Image from io import BytesIO from diffusers import UnCLIPScheduler, DDPMScheduler from diffusers.models import PriorTransformer from transformers import CLIPTokenizer, CLIPTextModelWithProjection from diffusers import StableUnCLIPPipeline, UNet2DConditionModel

karlo_model = "kakaobrain/karlo-v1-alpha" prior = PriorTransformer.from_pretrained(karlo_model, subfolder="prior")

clip_name = "openai/clip-vit-large-patch14"

clip_name = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K

prior_tokenizer = CLIPTokenizer.from_pretrained(clip_name) prior_text_model = CLIPTextModelWithProjection.from_pretrained(clip_name)

prior_scheduler = UnCLIPScheduler.from_pretrained(karlo_model, subfolder="prior_scheduler") prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

unet = UNet2DConditionModel.from_pretrained("Nbardy/stable-diffusion-unclip-diffusers", subfolder="unet")

Start the StableUnCLIP Image variations pipeline

pipe = StableUnCLIPPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1-unclip",

revision="sd21-unclip-l.ckpt",

torch_dtype=torch.float16, variation="fp16",
unet=unet,
prior_tokenizer=prior_tokenizer,
prior_text_encoder=prior_text_model,
prior=prior,
prior_scheduler=prior_scheduler,

)

pipe = pipe.to('cuda') wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular" negative_prompt = "((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), wierd colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render, bad-artist bad_prompt_version2"

Pipe to make the variation

images = pipe(prompt=wave_prompt).images images[0].save("tarsila_variation.png") display(images[0])

Logs

No response

System Info

!pip install git+https://github.com/huggingface/diffusers@main transformers accelerate scipy safetensors xformers

in colab is my setup

https://colab.research.google.com/drive/1y7som7KnaTOWuXAWYDIkCTVSm2otX9_R?usp=sharing

patrickvonplaten commented 1 year ago

Thanks for the issue - we should indeed provide better better docs here cc @sayakpaul do we have updated unclip docs already?

sayakpaul commented 1 year ago

Hi,

I looked into this a bit. https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip shouldn't be actually used with StableUnCLIPPipeline as its details are not very clear to us as of now.

https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip-small, whereas, is fine and we know that the Karlo model was used as one of the priors.

The following code works:

import torch
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection, CLIPTextModel

prior_model_id = "kakaobrain/karlo-v1-alpha"
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior")

prior_text_model_id = "openai/clip-vit-large-patch14"
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id)
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"
pipe = StableUnCLIPPipeline.from_pretrained(
    stable_unclip_model_id,
    prior_tokenizer=prior_tokenizer,
    prior_text_encoder=prior_text_model,
    prior=prior,
    prior_scheduler=prior_scheduler,
)

pipe = pipe.to("cuda")
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"

images = pipe(prompt=wave_prompt).images
images[0].save("tarsila_variation.png")

But, its fp16 variant is not working at the moment. @williamberman could you look into this?

After specifying torch_dtype=torch.float16 while initializing the StableUnCLIPPipeline when you call pipe(prompt=wave_prompt), it should lead to:

RuntimeError: mat1 and mat2 must have the same type

Here's a Colab Notebook that fully reproduces this error.

Cc: @patrickvonplaten. I would like the ^ issue get resolved first and then I will drop a PR to update the docs for StableUnCLIPPipeline.

Also cc: @apolinario to note the isolation of components when initializing the pipeline.

williamberman commented 1 year ago

@sayakpaul the components loaded separately from the pipeline need to be loaded in fp16 if the pipeline is loaded in fp16

I think this is ok and is the expected api. We could use a heuristic and check a parameter for the loaded pipelines and model components to check if they're the same dtype and add a warning log. However, I don't think that's super high priority if you have time to add, feel free

import torch
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection, CLIPTextModel

prior_model_id = "kakaobrain/karlo-v1-alpha"
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=torch.float16)

prior_text_model_id = "openai/clip-vit-large-patch14"
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=torch.float16)
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"

pipe = StableUnCLIPPipeline.from_pretrained(
        stable_unclip_model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        prior_tokenizer=prior_tokenizer,
        prior_text_encoder=prior_text_model,
        prior=prior,
        prior_scheduler=prior_scheduler,
)

pipe = pipe.to("cuda")
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"

images = pipe(prompt=wave_prompt).images
images[0].save("tarsila_variation.png")