huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.41k stars 5.43k forks source link

TyperError in VersatileDiffusionPipeline: get_down_block() got an unexpected keyword argument 'transformer_layers_per_block' #5998

Closed fotinidelig closed 12 months ago

fotinidelig commented 12 months ago

Describe the bug

I wan to use the VersatileDiffusionPipeline but I get an error when loading the model with from_pretrained().

TypeError                                 Traceback (most recent call last)

[<ipython-input-7-2673c81923ce>](https://localhost:8080/#) in <cell line: 1>()
----> 1 vdm = diffusers.VersatileDiffusionPipeline.from_pretrained(
      2     "shi-labs/versatile-diffusion",)
      3 
      4 vdm.remove_unused_weights()
      5 vdm = vdm.to("cuda")

5 frames

[/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1263             else:
   1264                 # load sub model
-> 1265                 loaded_sub_model = load_sub_model(
   1266                     library_name=library_name,
   1267                     class_name=class_name,

[/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py](https://localhost:8080/#) in load_sub_model(library_name, class_name, importable_classes, pipelines, is_pipeline_module, pipeline_class, torch_dtype, provider, sess_options, device_map, max_memory, offload_folder, offload_state_dict, model_variants, name, from_flax, variant, low_cpu_mem_usage, cached_folder, revision)
    518     # check if the module is in a subdirectory
    519     if os.path.isdir(os.path.join(cached_folder, name)):
--> 520         loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
    521     else:
    522         # else load from the root directory

[/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    879                 }
    880             else:
--> 881                 model = cls.from_config(config, **unused_kwargs)
    882 
    883                 state_dict = load_state_dict(model_file, variant=variant)

[/usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py](https://localhost:8080/#) in from_config(cls, config, return_unused_kwargs, **kwargs)
    253 
    254         # Return model and optionally state and/or unused_kwargs
--> 255         model = cls(**init_dict)
    256 
    257         # make sure to also save config parameters that might be used for compatible classes

[/usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py](https://localhost:8080/#) in inner_init(self, *args, **kwargs)
    643         new_kwargs = {**config_init_kwargs, **new_kwargs}
    644         getattr(self, "register_to_config")(**new_kwargs)
--> 645         init(self, *args, **init_kwargs)
    646 
    647     return inner_init

[/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py](https://localhost:8080/#) in __init__(self, sample_size, in_channels, out_channels, center_input_sample, flip_sin_to_cos, freq_shift, down_block_types, mid_block_type, up_block_types, only_cross_attention, block_out_channels, layers_per_block, downsample_padding, mid_block_scale_factor, dropout, act_fn, norm_num_groups, norm_eps, cross_attention_dim, transformer_layers_per_block, reverse_transformer_layers_per_block, encoder_hid_dim, encoder_hid_dim_type, attention_head_dim, num_attention_heads, dual_cross_attention, use_linear_projection, class_embed_type, addition_embed_type, addition_time_embed_dim, num_class_embeds, upcast_attention, resnet_time_scale_shift, resnet_skip_time_act, resnet_out_scale_factor, time_embedding_type, time_embedding_dim, time_embedding_act_fn, timestep_post_act, time_cond_proj_dim, conv_in_kernel, conv_out_kernel, projection_class_embeddings_input_dim, attention_type, class_embeddings_concat, mid_block_only_cross_attention, cross_attention_norm, addition_embed_type_num_heads)
    648             is_final_block = i == len(block_out_channels) - 1
    649 
--> 650             down_block = get_down_block(
    651                 down_block_type,
    652                 num_layers=layers_per_block[i],

TypeError: get_down_block() got an unexpected keyword argument 'transformer_layers_per_block'

Reproduction

import diffusers
import torch
vdm = diffusers.VersatileDiffusionPipeline.from_pretrained(
    "shi-labs/versatile-diffusion", torch_dtype=torch.float16)

Logs

No response

System Info

diffusers=0.24.0 torch=2.1.0+cu118 torchvision=0.16.0+cu118 Python 3.10.12

Who can help?

@yiyixuxu @DN6

yiyixuxu commented 12 months ago

hey thanks for the issue! are you interested in open an PR to fix it? just need to update the signature of get_down_block() and get_up_block() https://github.com/huggingface/diffusers/blob/f72b28c75b2b4b720a5d8de78556694cf4b893fd/src/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py#L43

charchit7 commented 12 months ago

Hey @fotinidelig I am raising the PR for the fix. I hope it's okay. Thanks for the bug. @yiyixuxu thanks for the suggestion. Sending PR 🚀

charchit7 commented 12 months ago

@fotinidelig there were three more missing signature : namely

attention_head_dim
attention_type
resolution_idx
fotinidelig commented 12 months ago

Hey @fotinidelig I am raising the PR for the fix. I hope it's okay. Thanks for the bug. @yiyixuxu thanks for the suggestion. Sending PR 🚀

Thanks a lot for taking over, appreciate it!