kohya-ss / sd-scripts

Apache License 2.0
5.25k stars 871 forks source link

torch 2.3.1+cuda12.1 with flux lora train, but got the train error when use single GPU, pls help #1554

Open lilyzlt opened 2 months ago

lilyzlt commented 2 months ago

Error

When I use 2 GPU to train flux lora, everything is fine, successful training~, but when I use one GPU or start with 2GPU, but use one, it start to have the error bellow, I tried : export NCCL_DEBUG=INFO export CUDA_DEVICE_ORDER="PCI_BUS_ID" export NCCL_IB_DISABLE=1 image image

Environment :

lion-pytorch 0.1.2 open-clip-torch 2.20.0 pytorch-lightning 1.9.0 torch 2.3.1+cu121 torchaudio 2.3.1+cu121 torchmetrics 1.4.1 torchvision 0.18.1+cu121 nvidia-nccl-cu12 2.20.5 centos 8 system image image please help!

lilyzlt commented 2 months ago

single GPU error:

even If I train sd-lora or sdxl-lora using single GPU, it all have this error: sd-scripts commit ID: f8f5b1695842cce15ba14e7edfacbeee41e71a75

Command:

python -m accelerate.commands.launch --num_cpu_threads_per_process=2 sd-scripts/train_network.py --config_file /data/lora-scripts/config/autosave/20240902-180623.toml

output

image

config: 20240902-180623.toml

pretrained_model_name_or_path = "/data/stable-diffusion-webui/models/Stable-diffusion/anything-v3.ckpt" train_data_dir = "/data/Images/" resolution = "512,512" enable_bucket = true min_bucket_reso = 256 max_bucket_reso = 1024 output_name = "test_model" output_dir = "/data/Lora/" save_model_as = "safetensors" save_every_n_epochs = 2 max_train_epochs = 10 train_batch_size = 1 network_train_unet_only = false network_train_text_encoder_only = false learning_rate = 0.0001 unet_lr = 0.0001 text_encoder_lr = 1e-5 lr_scheduler = "cosine_with_restarts" optimizer_type = "AdamW8bit" lr_scheduler_num_cycles = 1 network_module = "networks.lora" network_dim = 32 network_alpha = 32 logging_dir = "/data/logs" caption_extension = ".txt" shuffle_caption = true keep_tokens = 0 max_token_length = 255 seed = 1337 prior_loss_weight = 1 clip_skip = 2 mixed_precision = "fp16" save_precision = "fp16" xformers = true cache_latents = true persistent_data_loader_workers = tru