huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.13k stars 5.38k forks source link

Incorrect shape when using StableDiffusionXLPipeline.from_single_file #6183

Closed Dalanke closed 10 months ago

Dalanke commented 11 months ago

Describe the bug

I was trying to load the single safetensors from playground_v2 by calling StableDiffusionXLPipeline.from_single_file. The function failed with error message:

Traceback (most recent call last):
  File "workspace/sdxl/playground.py", line 9, in <module>
    pipeline = StableDiffusionXLPipeline.from_single_file(
  File "miniconda3/envs/diffusers/lib/python3.10/site-packages/diffusers/loaders/single_file.py", line 261, in from_single_file
    pipe = download_from_original_stable_diffusion_ckpt(
  File "miniconda3/envs/diffusers/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py", line 1713, in download_from_original_stable_diffusion_ckpt
    text_encoder_2 = convert_open_clip_checkpoint(
  File "miniconda3/envs/diffusers/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py", line 997, in convert_open_clip_checkpoint
    set_module_tensor_to_device(text_model, param_name, "cpu", value=param)
  File "miniconda3/envs/diffusers/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 281, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([1024]) in "bias" (which has shape torch.Size([1280])), this look incorrect.

I guess there is something wrong with the checkpoint as I tried with SDXL-base it's working, but I see same issue reported in #5219.

Reproduction

Using the checkpoint file playground-v2.safetensors

from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_single_file(
    "./playground/playgroundv2.safetensors", local_files_only=True, use_safetensors=True)
print('pipeline loaded')

pipeline.to('cuda')

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images
for i, img in enumerate(image):
    with open(f'./pg_{i}.jpg','w+') as f:
        img.save(f)

Logs

No response

System Info

Who can help?

No response

hi-sushanta commented 11 months ago

While the safe-tensors files are quite large, I opted not to download them. However, with a slight modification to your code, I was able to achieve the desired outcome.

from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained("playgroundai/playground-v2-1024px-aesthetic")
print('pipeline loaded')

pipeline.to('cpu')

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images
for i, img in enumerate(image):
    with open(f'./pg_{i}.jpg','w+') as f:
        img.save(f)
Dalanke commented 10 months ago

While the safe-tensors files are quite large, I opted not to download them. However, with a slight modification to your code, I was able to achieve the desired outcome.

from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained("playgroundai/playground-v2-1024px-aesthetic")
print('pipeline loaded')

pipeline.to('cpu')

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images
for i, img in enumerate(image):
    with open(f'./pg_{i}.jpg','w+') as f:
        img.save(f)

Thanks for the reply! This solution works well as it actually download the other components' folders that following the pipelines’ underlying folder structure described in the doc. I know I can download and load it locally if needed. I just want to check if StableDiffusionXLPipeline.from_single_file can work as this single safetensor works well with AUTOMATIC1111's webui.

It seems like #5219 came with same issue when loading single file and I just wonder if it's a potential bug. I have made some work around with convert_open_clip_checkpoint in diffusers/pipelines/stable_diffusion/convert_from_ckpt.py but no sure this is just a special case.

patrickvonplaten commented 10 months ago

cc @DN6 could you take a look here?

DN6 commented 10 months ago

Hi @Dalanke I'm unable to reproduce this. Based on the error, it looks like it is related to the CLIP model. Do you happen to have a locally saved CLIP model with a projection dim of 1280?

Dalanke commented 10 months ago

Hi @Dalanke I'm unable to reproduce this. Based on the error, it looks like it is related to the CLIP model. Do you happen to have a locally saved CLIP model with a projection dim of 1280?

I am using a local version of laion/CLIP-ViT-bigG-14-laion2B-39B-b160k, which I cloned from here. I kept the default config which set "projection_dim": 1280.

Actually I dive in a little bit. The from_single_file function will call download_from_original_stable_diffusion_ckpt in diffusers/pipelines/stable_diffusion/convert_from_ckpt.py. If you are trying to loading a SDXL pipeline, below are the lines that handling the tokenizer and text encoder:

if model_type == "SDXL":
      try:
          tokenizer = CLIPTokenizer.from_pretrained(
              "openai/clip-vit-large-patch14", local_files_only=local_files_only
          )
      except Exception:
          raise ValueError(
              f"With local_files_only set to {local_files_only}, you must first locally save the tokenizer in the following path: 'openai/clip-vit-large-patch14'."
          )
      text_encoder = convert_ldm_clip_checkpoint(checkpoint, local_files_only=local_files_only)
      try:
          tokenizer_2 = CLIPTokenizer.from_pretrained(
              "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k", pad_token="!", local_files_only=local_files_only
          )
      except Exception:
          raise ValueError(
              f"With local_files_only set to {local_files_only}, you must first locally save the tokenizer in the following path: 'laion/CLIP-ViT-bigG-14-laion2B-39B-b160k' with `pad_token` set to '!'."
          )

      config_name = "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k"
      config_kwargs = {"projection_dim": 1280}
      text_encoder_2 = convert_open_clip_checkpoint(
          checkpoint,
          config_name,
          prefix="conditioner.embedders.1.model.",
          has_projection=True,
          local_files_only=local_files_only,
          **config_kwargs,
      )

You can see laion/CLIP-ViT-bigG-14-laion2B-39B-b160k was using as tokenizer_2 and the config of text_encoder_2. Now step into function convert_open_clip_checkpoint we can see how dimension is calculated:

    if prefix + "text_projection" in checkpoint:
        d_model = int(checkpoint[prefix + "text_projection"].shape[0])
    else:
        d_model = 1024

Here is why the 1024 came. It turns out that playground-v2.safetensors only have the key text_projection.weight so the model can be loaded when the code modified into:

if prefix + "text_projection" in checkpoint:
        d_model = int(checkpoint[prefix + "text_projection"].shape[0])
elif prefix + "text_projection.weight" in checkpoint:
        d_model = int(checkpoint[prefix + "text_projection.weight"].shape[0])
else:
        d_model = 1024

Other changes are also required to mapping text_projection.weight with text_projection. I just curious if this is an common issue. It seems like using different version of conversion script to convert diffuser format to A1111 format(single file) will cause this issue. Let me know if this issue worth a PR to improve compatibility

DN6 commented 10 months ago

Hi @Dalanke. When I try to load the single file checkpoint from the playground model repo I don't see any issues and the projection dimension is set correctly.

Is the checkpoint you're using from the conversion script?

Dalanke commented 10 months ago

Hi @DN6. I am sorry to provide an incorrect link. I just noticed there is an update on the repo. I use this checkpoint at commit 0c295353e4624ad342488444b84bd59251383fd0. Checkpoint SHA256=a8f32f89aaa2f1194e87fb0320997a765062426ebfccfb3d83d796fbd1a066ff

DN6 commented 10 months ago

Hi @Dalanke any particular reason to use that checkpoint revision? The latest revision seems to work fine with single file?

Dalanke commented 10 months ago

Hi @DN6. I just downloaded it before this checkpoint got updated. The latest revision is a perfect solution for me now. I am wondering if anything goes wrong as I see #5219 complained the same thing. Maybe some breaking changes in conversion scripts lead to this compatibility issue.

DN6 commented 10 months ago

Will take a look into the conversion script. But just checking if we can close this issue?

Dalanke commented 10 months ago

Let's close this issue for now and reopen it if necessary.