modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!
Apache License 2.0
6.31k stars 563 forks source link

kolors lora train error #121

Closed lonngxiang closed 1 month ago

lonngxiang commented 1 month ago

CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors --pretrained_text_encoder_path models/kolors/Kolors/text_encoder --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors --dataset_path data/dog --output_path ./models --max_epochs 10 --center_crop --use_gradient_checkpointing --precision "16-mixed" Loading checkpoint shards: 100%|████████████████████████████████████| 7/7 [00:00<00:00, 7.61it/s] Traceback (most recent call last): File "/ai/DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py", line 300, in model = LightningModel( File "/ai/DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py", line 69, in init self.pipe.vae_encoder = load_model_from_diffsynth(SDXLVAEEncoder, {}, pretrained_fp16_vae_path, torch_dtype, self.device) File "/ai/DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py", line 45, in load_model_from_diffsynth state_dict = load_state_dict(state_dict_path, torch_dtype=torch_dtype) File "/ai/DiffSynth-Studio/diffsynth/models/init.py", line 759, in load_state_dict return load_state_dict_from_safetensors(file_path, torch_dtype=torch_dtype) File "/ai/DiffSynth-Studio/diffsynth/models/init.py", line 766, in load_state_dict_from_safetensors with safe_open(file_path, framework="pt", device="cpu") as f: OSError: No such device (os error 19)

lonngxiang commented 1 month ago

改成cuda还是不行 with safe_open(file_path, framework="pt", device="cuda") as f: OSError: No such device (os error 19)

image

lonngxiang commented 1 month ago

safetensors 0.4.3 版本

lonngxiang commented 1 month ago

transformers 4.43.1

lonngxiang commented 1 month ago

image

lonngxiang commented 1 month ago

torch 2.3.1

lonngxiang commented 1 month ago

找到问题了,模型对应目录结构不一致导致