Closed bohong13 closed 1 year ago
I have the same problem, the KeyError: 'down_blocks.0.attentions.0.transformer_blocks.0.attn1.processor'
is caused by the fact that the state dict generated by the latest training code has prefix to all the layers, so down_blocks.0.attentions.0.transformer_blocks.0.attn1.processor
is acutally called unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.processor
in the state dict. I think this is because the new training code supports text encoder tuning so the prefix is added to differentiate...
@haowang1013 But when i training lora only using one machine , everything is fine .
@haowang1013 But when i training lora only using one machine , everything is fine .
Yeah I never had any problem with training, probably because I was only using one machine. That key errro happens to me when I tried to load the lora state dict using pipeline.unet.load_attn_procs
This fixed the loading error for me, in unet_2d_condition.py
def fn_recursive_attn_processor(name: str, module: torch.nn.Module, processor):
if hasattr(module, "set_processor"):
if not isinstance(processor, dict):
module.set_processor(processor)
else:
# was module.set_processor(processor.pop(f"{name}.processor"))
processor_name = f"{name}.processor"
if processor_name in processor:
module.set_processor(processor.pop(processor_name))
else:
processor_name = f"unet.{processor_name}"
module.set_processor(processor.pop(processor_name))
This fixed the loading error for me, in
unet_2d_condition.py
def fn_recursive_attn_processor(name: str, module: torch.nn.Module, processor): if hasattr(module, "set_processor"): if not isinstance(processor, dict): module.set_processor(processor) else: # was module.set_processor(processor.pop(f"{name}.processor")) processor_name = f"{name}.processor" if processor_name in processor: module.set_processor(processor.pop(processor_name)) else: processor_name = f"unet.{processor_name}" module.set_processor(processor.pop(processor_name))
Thank you! but i still get error when i using two machines
Steps: 100%|█████████████████████████████████| 500/500 [06:49<00:00, 1.75it/s, loss=0.0406, lr=0.0001]Model weights saved in /home/momistest/db/diffusers/examples/dreambooth/lora_output/checkpoint-500/pytorch_lora_weights.bin 05/08/2023 16:23:43 - INFO - __main__ - Saved state to /home/momistest/db/diffusers/examples/dreambooth/lora_output/checkpoint-500 Steps: 100%|██████████████████████████████████| 500/500 [06:49<00:00, 1.75it/s, loss=0.208, lr=0.0001]Model weights saved in /home/momistest/db/diffusers/examples/dreambooth/lora_output/pytorch_lora_weights.bin {'requires_safety_checker'} was not found in config. Values will be initialized to default values. {'prediction_type'} was not found in config. Values will be initialized to default values. {'mid_block_only_cross_attention', 'encoder_hid_dim', 'resnet_skip_time_act', 'time_embedding_act_fn', 'time_cond_proj_dim', 'resnet_out_scale_factor', 'resnet_time_scale_shift', 'only_cross_attention', 'class_embed_type', 'projection_class_embeddings_input_dim', 'time_embedding_dim', 'addition_embed_type_num_heads', 'upcast_attention', 'conv_out_kernel', 'cross_attention_norm', 'dual_cross_attention', 'class_embeddings_concat', 'mid_block_type', 'num_class_embeds', 'timestep_post_act', 'time_embedding_type', 'conv_in_kernel', 'use_linear_projection', 'addition_embed_type'} was not found in config. Values will be initialized to default values. {'scaling_factor'} was not found in config. Values will be initialized to default values. `text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden. {'dynamic_thresholding_ratio', 'lower_order_final', 'thresholding', 'solver_type', 'algorithm_type', 'solver_order', 'use_karras_sigmas', 'sample_max_value'} was not found in config. Values will be initialized to default values. Loading unet. Traceback (most recent call last): File "/home/momistest/db/diffusers/examples/dreambooth/train_dreambooth_lora.py", line 1112, in <module> main(args) File "/home/momistest/db/diffusers/examples/dreambooth/train_dreambooth_lora.py", line 1067, in main pipeline.load_lora_weights(args.output_dir) File "/home/momistest/db/diffusers/src/diffusers/loaders.py", line 847, in load_lora_weights self.unet.load_attn_procs(unet_lora_state_dict) File "/home/momistest/db/diffusers/src/diffusers/loaders.py", line 305, in load_attn_procs self.set_attn_processor(attn_processors) File "/home/momistest/db/diffusers/src/diffusers/models/unet_2d_condition.py", line 539, in set_attn_processor fn_recursive_attn_processor(name, module, processor) File "/home/momistest/db/diffusers/src/diffusers/models/unet_2d_condition.py", line 536, in fn_recursive_attn_processor fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor) File "/home/momistest/db/diffusers/src/diffusers/models/unet_2d_condition.py", line 536, in fn_recursive_attn_processor fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor) File "/home/momistest/db/diffusers/src/diffusers/models/unet_2d_condition.py", line 536, in fn_recursive_attn_processor fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor) [Previous line repeated 3 more times] File "/home/momistest/db/diffusers/src/diffusers/models/unet_2d_condition.py", line 533, in fn_recursive_attn_processor module.set_processor(processor.pop(processor_name)) KeyError: 'unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.processor'
Which version are you on? There's a commit that has a bunch of lora related fixes which is not included in 0.16.1
You may have to wait till the next version or install the latest version from github.
This seems to be related to https://github.com/huggingface/diffusers/pull/3353 - trying to fix it asap
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
I have found two errors.
Then I try to solve this error using the method in issue #3284
but i get this error
Reproduction
I followed this dog example to run the program on two machines.
I have two laptops with NVIDIA RTX 3080 GPUs. machine 1 IP is 192.168.1.123 machine 2 IP is 192.168.1.183
The environment and package versions of the two machines are exactly the same
and I Run this script on two machine
Logs
System Info