Closed SylwiaNowakowska closed 8 months ago
I cannot reproduce the issue on my end. I would recommend upgrading your PyTorch version as well as other libraries such as transformers
and accelerate
.
Also, the error logs aren't descriptive. We don't have any way to confirm which part of the code causes the issue.
Thank you so much for the quick answer. I have upgraded transformers to 4.38.2 and accelerate to 0.28.0. I did not upgrade torch because of CUDA compability issues. I get the same error. In case you would have some further suggestions, I would be happy to test them.
Unfortunate situation. I am unable to reproduce the error on my end :/
anyway thx for your support!
Describe the bug
The latest train_dreambooth_lora_sdxl.py script with diffusers 0.27.0dev produces an error Signals.SIGKILL: 9. The train_dreambooth_lora_sdxl.py script (version from 28.02: 7db935a) works with diffusers 0.26.3, but the issue in this case is that resuming from checkpoint does not work. I have seen that the issue has been fixed later in commmit 5f150c4 with the script requiring 0.27.0dev - I have tested also that and it results with the same error: Signals.SIGKILL: 9.
Reproduction
!accelerate launch train_dreambooth_lora_sdxl.py \ --pretrained_model_name_or_path='stabilityai/stable-diffusion-xl-base-1.0' \ --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \ --cache_dir='.../Project/cache_dir' \ --dataset_name='.../Project/DATASET'\ --image_column="image" \ --caption_column="text" \ --repeats=1 \ --instance_prompt="In the style of MaGHY" \ --validation_prompt="In the style of MaGHY, a MLO mammogram." \ --num_validation_images=4 \ --validation_epochs=1 \ --output_dir='.../Project/OUTPUT/03_RUN'\ --seed=42 \ --resolution=1024 \ --train_text_encoder \ --train_batch_size=1 \ --sample_batch_size=1 \ --max_train_steps=200 \ --checkpointing_steps=10 \ --checkpoints_total_limit=100 \ --gradient_accumulation_steps=5 \ --gradient_checkpointing \ --learning_rate=2e-04 \ --text_encoder_lr=5e-6 \ --lr_scheduler="constant" \ --snr_gamma=5.0 \ --lr_warmup_steps=500 \ --lr_num_cycles=1 \ --lr_power=1.0 \ --dataloader_num_workers=0 \ --optimizer="AdamW" \ --adam_beta1=0.9 \ --adam_beta2=0.999 \ --adam_weight_decay=1e-04 \ --adam_weight_decay_text_encoder=1e-03 \ --adam_epsilon=1e-08 \ --max_grad_norm=1.0 \ --report_to=wandb \ --mixed_precision="fp16" \ --prior_generation_precision="fp16" \ --local_rank=-1 \ --use_8bit_adam \ --rank=4
Logs
System Info
diffusers
version: 0.27.0.dev0GPU: NVIDIA GeForce RTX 3090 (24Â GB)
Who can help?
@yiyixuxu @sayakpaul @DN6 I would appreciate your help