dtype mismatch of quary when train LCM lora

solitaryTian commented 11 months ago

Traceback (most recent call last): File "/dfs/comicai/songtao.tian/latent-consistency-model-main/LCM_Training_Script/consistency_distillation/./train_lcm_distill_lora_sd_wds.py", line 1378, in main(args) File "/dfs/comicai/songtao.tian/latent-consistency-model-main/LCM_Training_Script/consistency_distillation/./train_lcm_distill_lora_sd_wds.py", line 1356, in main log_validation(vae, unet, args, accelerator, weight_dtype, global_step) File "/dfs/comicai/songtao.tian/latent-consistency-model-main/LCM_Training_Script/consistency_distillation/./train_lcm_distill_lora_sd_wds.py", line 331, in log_validation images = pipeline( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 918, in call noise_pred = self.unet( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 1075, in forward sample, res_samples = downsample_block( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 1160, in forward hidden_states = attn( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/transformer_2d.py", line 392, in forward hidden_states = block( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/attention.py", line 323, in forward attn_output = self.attn2( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 522, in forward return self.processor( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1142, in call hidden_states = xformers.ops.memory_efficient_attention( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 223, in memory_efficient_attention return _memory_efficient_attention( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 321, in _memory_efficient_attention return _memory_efficient_attention_forward( File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 334, in _memory_efficient_attention_forward inp.validate_inputs() File "/root/miniconda3/envs/LCM/lib/python3.9/site-packages/xformers/ops/fmha/common.py", line 121, in validate_inputs raise ValueError( ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32 query.dtype: torch.float32 key.dtype : torch.float16 value.dtype: torch.float16

Powerofev1l commented 11 months ago

we met the same problem, bro

solitaryTian commented 11 months ago

we met the same problem, bro

have you solved it?

Powerofev1l commented 11 months ago

we met the same problem, bro

have you solved it?

no,but i continue to train it, as the ckpt has been saved

solitaryTian commented 11 months ago

we met the same problem, bro

have you solved it?

no,but i continue to train it, as the ckpt has been saved

Did your code forcefully stop without reporting an error?

Powerofev1l commented 11 months ago

we met the same problem, bro

have you solved it?

no,but i continue to train it, as the ckpt has been saved

Did your code forcefully stop without reporting an error?

it stop with the same error, but i check my output folder and found the ckpt had been saved, so i resume from the latest manually

solitaryTian commented 11 months ago

we met the same problem, bro

have you solved it?

no,but i continue to train it, as the ckpt has been saved

Did your code forcefully stop without reporting an error?

it stop with the same error, but i check my output folder and found the ckpt had been saved, so i resume from the latest manually

Because every time you save the checkpoint, an error will be reported and forced to stop, so do you need to perform the operation you said every time an error is reported and stopped? Will this affect the accuracy of the final trained model?

Powerofev1l commented 11 months ago

we met the same problem, bro

have you solved it?

no,but i continue to train it, as the ckpt has been saved

Did your code forcefully stop without reporting an error?

it stop with the same error, but i check my output folder and found the ckpt had been saved, so i resume from the latest manually

Because every time you save the checkpoint, an error will be reported and forced to stop, so do you need to perform the operation you said every time an error is reported and stopped? Will this affect the accuracy of the final trained model?

yeah, i did perform the operation every 200 steps. but i can't answer ur latter question because i have problem with using the unet_lora i finally got. when I tried with the pipeline given on LCM HUGGINGFACE, I met some new error,maybe u will see the later,bro

The config attributes {'skip_prk_steps': True} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file. You have saved the LoRA weights using the old format. To convert the old LoRA weights to the new format, you can first load them in a dictionary and then create a new dictionary like the following: new_state_dict = {f'unet.{module_name}': params for module_name, params in old_state_dict.items()}. Traceback (most recent call last): File "/workspace/LCM/test_run_sdxl_lora.py", line 12, in pipe.load_lora_weights(adapter_id) File "/workspace/diffusers/src/diffusers/loaders/lora.py", line 1579, in load_lora_weights self.load_lora_into_unet( File "/workspace/diffusers/src/diffusers/loaders/lora.py", line 546, in load_lora_into_unet inject_adapter_in_model(lora_config, unet, adapter_name=adapter_name) File "/opt/conda/envs/LCM/lib/python3.10/site-packages/peft/mapping.py", line 146, in inject_adapter_in_model peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name) File "/opt/conda/envs/LCM/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 111, in init super().init(model, config, adapter_name) File "/opt/conda/envs/LCM/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in init self.inject_adapter(self.model, adapter_name) File "/opt/conda/envs/LCM/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 254, in inject_adapter raise ValueError( ValueError: Target modules {'base_model.model.up_blocks.2.resnets.2.conv1', 'base_model.model.down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0','base_model.model.down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0'} not found in the base model. Please check the target modules and try again.

dcfucheng commented 11 months ago

I find that commenting 'pipeline.enable_xformers_memory_efficient_attention()' in log_validation method will work for lora training. However, the generated images are not good enough as it trained in train_lcm_distill_sd_wds.py. Does your lora training work?

solitaryTian commented 11 months ago

I find that commenting 'pipeline.enable_xformers_memory_efficient_attention()' in log_validation method will work for lora training. However, the generated images are not good enough as it trained in train_lcm_distill_sd_wds.py. Does your lora training work?

You are right. I can solve this error by close 'enable_xformers_memory_efficient_attention()'. However, when I train a LCM sdxl lora, closing 'enable_xformers_memory_efficient_attention()' makes there is no sufficient memory for runing log_validation function to test the image quality generated by my trained lora on A100. Is there any other solutions without closing 'enable_xformers_memory_efficient_attention()' or other methods to save memory?

My lora training works.

Powerofev1l commented 11 months ago

I find that commenting 'pipeline.enable_xformers_memory_efficient_attention()' in log_validation method will work for lora training. However, the generated images are not good enough as it trained in train_lcm_distill_sd_wds.py. Does your lora training work?

You are right. I can solve this error by close 'enable_xformers_memory_efficient_attention()'. However, when I train a LCM sdxl lora, closing 'enable_xformers_memory_efficient_attention()' makes there is no sufficient memory for runing log_validation function to test the image quality generated by my trained lora on A100. Is there any other solutions without closing 'enable_xformers_memory_efficient_attention()' or other methods to save memory?

My lora training works.

is this photo from sdxl lora? what is the inference code u use

solitaryTian commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

nope,i didn't meet insufficient memeroy. i trained it with a single A800. i mannuly continue to train sdxl lora every 200 steps,but the ckpt i got seems not suit diffusers's stand, so i got many bugs

dcfucheng commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_lora_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

maybe ur trainning data has some peobelm, i tried laion-art and it is blurred too, change dataset and it works

dcfucheng commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

maybe ur trainning data has some peobelm, i tried laion-art and it is blurred too, change dataset and it works

But the same data, in SD1.5, lcm works, but lora-lcm does not work....

dcfucheng commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

nope,i didn't meet insufficient memeroy. i trained it with a single A800. i mannuly continue to train sdxl lora every 200 steps,but the ckpt i got seems not suit diffusers's stand, so i got many bugs

You can try it like this. https://github.com/luosiallen/latent-consistency-model/issues/57

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

nope,i didn't meet insufficient memeroy. i trained it with a single A800. i mannuly continue to train sdxl lora every 200 steps,but the ckpt i got seems not suit diffusers's stand, so i got many bugs

You can try it like this. #57

thanks,i will check it

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_lora_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

I trained SDXL LORA, but the result shows that I have someting done wrong . what is ur blurred looks like?

dcfucheng commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_lora_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

I trained SDXL LORA, but the result shows that I have someting done wrong . what is ur blurred looks like?

It looks similar to what I trained on SD20-base lora. You can change the guidance_scale, I find that guidance_scale=7 is better but also blur. (ps: my training steps are very very small, I am not sure about big steps) example_SD20_lora

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_lora_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

I trained SDXL LORA, but the result shows that I have someting done wrong . what is ur blurred looks like?

It looks similar to what I trained on SD20-base lora. You can change the guidance_scale, I find that guidance_scale=7 is better but also blur. (ps: my training steps are very very small, I am not sure about big steps)

yeah, my model is sdxl base1.0, other setting are completely the same with readme.md, so sad the result are so blurred

Powerofev1l commented 11 months ago

is this photo from sdxl lora? what is the inference code u use

No, it is from the lora of a fine-tuned version of sd1.5 (a private model of my company). This is just generated by the 'log_validation' function of the official code. Have you meet the insufficient memory error when training LCM-lora sdxl?

I have no idea about memory. I use the standard train_lcm_distill_sd_wds.py for SD1.5(runwayml/stable-diffusion-v1-5) lora training with our data. But I find the generated images are blurred with guidance_scale=7. And it not work when guidance_scale=1.0. What is your training config?

maybe ur trainning data has some peobelm, i tried laion-art and it is blurred too, change dataset and it works

But the same data, in SD1.5, lcm works, but lora-lcm does not work....

your are right.... I tried with SD distill, it works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

dcfucheng commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

Powerofev1l commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

dcfucheng commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

For lora SD15 testing, we use guidance_scale=1, inference steps 1,4,8 works. For training, we test different guidance_scale and lr, it works well with longer training steps with our own dataset. (Also has some bad cases, but it generates pictures).

Powerofev1l commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

For lora SD15 testing, we use guidance_scale=1, inference steps 1,4,8 works. For training, we test different guidance_scale and lr, it works well with longer training steps with our own dataset. (Also has some bad cases, but it generates pictures).

yes,i tried guidance_scale like 2 3 4,it work well, but when i tried like 8, it is very bad ,the output photo contains many weird lines

Powerofev1l commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

For lora SD15 testing, we use guidance_scale=1, inference steps 1,4,8 works. For training, we test different guidance_scale and lr, it works well with longer training steps with our own dataset. (Also has some bad cases, but it generates pictures).

yes,i tried guidance_scale like 2 3 4,it work well, but when i tried like 8, it is very bad ,the output photo contains many weird lines

by the way , here i used SDXL 1.0 BASE LORA

dcfucheng commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

For lora SD15 testing, we use guidance_scale=1, inference steps 1,4,8 works. For training, we test different guidance_scale and lr, it works well with longer training steps with our own dataset. (Also has some bad cases, but it generates pictures).

yes,i tried guidance_scale like 2 3 4,it work well, but when i tried like 8, it is very bad ,the output photo contains many weird lines

by the way , here i used SDXL 1.0 BASE LORA

The XL lora in huggingface should work well with 8 steps. You mean the model you trained? I think the bad outputs may be caused by dataset, or less training, or w_max=15 (use small)? You can email to dcfucheng@hotmail.com for discussion.

Powerofev1l commented 11 months ago

t works well. but when i tried to use lora, no matter sd1.5 or sdxl, it doesn't work......

SD1.5 lora works well~ I find that when the lora training is up to 5000 steps, it can generate pictures well. You can try it with longer training steps. This is different to SD distill.

could u please tell me what is the inference steps and guidance_scale u use?

For lora SD15 testing, we use guidance_scale=1, inference steps 1,4,8 works. For training, we test different guidance_scale and lr, it works well with longer training steps with our own dataset. (Also has some bad cases, but it generates pictures).

yes,i tried guidance_scale like 2 3 4,it work well, but when i tried like 8, it is very bad ,the output photo contains many weird lines

by the way , here i used SDXL 1.0 BASE LORA

The XL lora in huggingface should work well with 8 steps. You mean the model you trained? I think the bad outputs may be caused by dataset, or less training, or w_max=15 (use small)? You can email to dcfucheng@hotmail.com for discussion.

thanks,i have gotten good result in sdxl base lora, steps=8 guidacne_scale=2.0.. i just wonder why when guidance_scale =8 ,the output of lcm lora is so bad, worse than other models~~ anyway, thanks for ur help

luosiallen / latent-consistency-model

dtype mismatch of quary when train LCM lora #69