huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.47k stars 5.46k forks source link

`from_ckpt` not work for Stable Diffusion 2.x weights #3661

Closed ctrysbita closed 1 year ago

ctrysbita commented 1 year ago

Describe the bug

from_ckpt not work for Stable Diffusion 2.x weights

Reproduction

from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_ckpt(
    'https://huggingface.co/waifu-diffusion/wd-1-5-beta3/blob/main/wd-illusion-fp16.safetensors'
)

Just try to load a Stable Diffusion 2.x weight and you will get

RuntimeError                              Traceback (most recent call last)
Cell In[2], line 2
      1 # pipe = StableDiffusionPipeline.from_pretrained(model)
----> 2 pipe = StableDiffusionPipeline.from_ckpt(
      3     'https://huggingface.co/waifu-diffusion/wd-1-5-beta3/blob/main/wd-illusion-fp16.safetensors'
      4 )
      5 # pipe = DiffusionPipeline.from_pretrained(
      6 #     model,
      7 #     vae=AutoencoderKL.from_pretrained("weights/diffusers/nai/vae"),
      8 #     custom_pipeline="lpw_stable_diffusion",
      9 # )
     10 pipe = pipe.to("cuda")

File /*/lib/python3.10/site-packages/diffusers/loaders.py:1471, in FromCkptMixin.from_ckpt(cls, pretrained_model_link_or_path, **kwargs)
   1457         file_path = file_path[len("main/") :]
   1459     pretrained_model_link_or_path = hf_hub_download(
   1460         repo_id,
   1461         filename=file_path,
   (...)
   1468         force_download=force_download,
   1469     )
-> 1471 pipe = download_from_original_stable_diffusion_ckpt(
   1472     pretrained_model_link_or_path,
   1473     pipeline_class=cls,
   1474     model_type=model_type,
   1475     stable_unclip=stable_unclip,
   1476     controlnet=controlnet,
   1477     from_safetensors=from_safetensors,
   1478     extract_ema=extract_ema,
   1479     image_size=image_size,
   1480     scheduler_type=scheduler_type,
   1481     num_in_channels=num_in_channels,
   1482     upcast_attention=upcast_attention,
   1483     load_safety_checker=load_safety_checker,
   1484     prediction_type=prediction_type,
   1485 )
   1487 if torch_dtype is not None:
   1488     pipe.to(torch_dtype=torch_dtype)

File /*/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:1298, in download_from_original_stable_diffusion_ckpt(checkpoint_path, original_config_file, image_size, prediction_type, model_type, extract_ema, scheduler_type, num_in_channels, upcast_attention, device, from_safetensors, stable_unclip, stable_unclip_prior, clip_stats_path, controlnet, load_safety_checker, pipeline_class, local_files_only)
   1289     pipe = PaintByExamplePipeline(
   1290         vae=vae,
   1291         image_encoder=vision_model,
   (...)
   1295         feature_extractor=feature_extractor,
   1296     )
   1297 elif model_type == "FrozenCLIPEmbedder":
-> 1298     text_model = convert_ldm_clip_checkpoint(checkpoint, local_files_only=local_files_only)
   1299     tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
   1301     if load_safety_checker:

File /*/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:741, in convert_ldm_clip_checkpoint(checkpoint, local_files_only)
    738     if key.startswith("cond_stage_model.transformer"):
    739         text_model_dict[key[len("cond_stage_model.transformer.") :]] = checkpoint[key]
--> 741 text_model.load_state_dict(text_model_dict)
    743 return text_model

File /*/lib/python3.10/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict)
   2036         error_msgs.insert(
   2037             0, 'Missing key(s) in state_dict: {}. '.format(
   2038                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   2040 if len(error_msgs) > 0:
-> 2041     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   2042                        self.__class__.__name__, "\n\t".join(error_msgs)))
   2043 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for CLIPTextModel:
    Missing key(s) in state_dict: "text_model.embeddings.position_ids", "text_model.embeddings.token_embedding.weight", "text_model.embeddings.position_embedding.weight", "text_model.encoder.layers.0.self_attn.k_proj.weight", "text_model.encoder.layers.0.self_attn.k_proj.bias", "text_model.encoder.layers.0.self_attn.v_proj.weight", "text_model.encoder.layers.0.self_attn.v_proj.bias", "text_model.encoder.layers.0.self_attn.q_proj.weight", "text_model.encoder.layers.0.self_attn.q_proj.bias", 
...

Logs

No response

System Info

patrickvonplaten commented 1 year ago

@williamberman did you by any chance fix this one already with https://github.com/huggingface/diffusers/pull/3657?

williamberman commented 1 year ago

unrelated! that PR was controlnet specific