kohya-ss / sd-scripts

Apache License 2.0
5.12k stars 853 forks source link

LoKr Error when training Lora #1402

Open Verun11 opened 3 months ago

Verun11 commented 3 months ago

getting errors when trying to train Lora with LoKr

error from cmd window:

import network module: lycoris.kohya
2024-06-29 16:11:06|[LyCORIS]-WARNING: disable_conv_cp and use_cp are deprecated. Please use use_tucker instead.
2024-06-29 16:11:06|[LyCORIS]-INFO: Using rank adaptation algo: lokr
2024-06-29 16:11:06|[LyCORIS]-INFO: Use Dropout value: 0.0
2024-06-29 16:11:06|[LyCORIS]-INFO: Create LyCORIS Module
2024-06-29 16:11:06|[LyCORIS]-INFO: Create LyCORIS Module
2024-06-29 16:11:07|[LyCORIS]-INFO: create LyCORIS for Text Encoder: 264 modules.
2024-06-29 16:11:07|[LyCORIS]-INFO: Create LyCORIS Module
2024-06-29 16:11:07|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=320 and factor=6, using full matrix mode.
2024-06-29 16:11:07|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=640 and factor=6, using full matrix mode.
2024-06-29 16:11:07|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=1280 and factor=6, using full matrix mode.
2024-06-29 16:11:08|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=2560 and factor=6, using full matrix mode.
2024-06-29 16:11:09|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=1920 and factor=6, using full matrix mode.
2024-06-29 16:11:10|[LyCORIS]-WARNING: lora_dim 100000 is too large for dim=960 and factor=6, using full matrix mode.
2024-06-29 16:11:10|[LyCORIS]-INFO: create LyCORIS for U-Net: 788 modules.
2024-06-29 16:11:10|[LyCORIS]-INFO: module type table: {'LokrModule': 1052}
2024-06-29 16:11:10|[LyCORIS]-INFO: enable LyCORIS for text encoder
2024-06-29 16:11:10|[LyCORIS]-INFO: enable LyCORIS for U-Net
prepare optimizer, data loader etc.
2024-06-29 16:11:10 INFO     use Prodigy optimizer | {'decouple': True, 'weight_decay': 0.5,          train_util.py:4205
                             'd_coef': 0.2, 'use_bias_correction': True, 'betas': (0.9, 0.99),
                             'safeguard_warmup': False, 'd0': 1e-05}
Using decoupled weight decay
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 334
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 170
  num epochs / epoch数: 25
  batch size per device / バッチサイズ: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 4175
steps:   0%|                                                                                  | 0/4175 [00:00<?, ?it/s]
epoch 1/25
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
2024-06-29 16:11:49 INFO     epoch is incremented. current_epoch: 0, epoch: 1                          train_util.py:661
Traceback (most recent call last):
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module>
    trainer.train(args)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\train_network.py", line 967, in train
    noise_pred = self.call_unet(
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward
    return model_forward(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward
    h = call_module(module, h, emb, context)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1093, in call_module
    x = layer(x, emb)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 348, in forward
    x = torch.utils.checkpoint.checkpoint(create_custom_forward(self.forward_body), x, emb, use_reentrant=USE_REENTRANT)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\_dynamo\external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 344, in custom_forward
    return func(*inputs)
  File "C:\SD\Kohya\kohya2\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 336, in forward_body
    return x + h
RuntimeError: The size of tensor a (38) must match the size of tensor b (36) at non-singleton dimension 3
steps:   0%|                                                                                  | 0/4175 [00:30<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\nietx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\nietx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "C:\SD\Kohya\kohya2\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\SD\\Kohya\\kohya2\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/SD/Kohya/kohya2/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'C:\\SD\\loraS\\girl5\\model/config_lora-20240629-161049.toml']' returned non-zero exit status 1.

heres my Lora settings

{
  "LoRA_type": "LyCORIS/LoKr",
  "LyCORIS_preset": "full",
  "adaptive_noise_scale": 0,
  "additional_parameters": "",
  "async_upload": false,
  "block_alphas": "",
  "block_dims": "",
  "block_lr_zero_threshold": "",
  "bucket_no_upscale": false,
  "bucket_reso_steps": 64,
  "bypass_mode": false,
  "cache_latents": false,
  "cache_latents_to_disk": false,
  "caption_dropout_every_n_epochs": 0,
  "caption_dropout_rate": 0,
  "caption_extension": ".txt",
  "clip_skip": 0,
  "color_aug": false,
  "constrain": 0,
  "conv_alpha": 1,
  "conv_block_alphas": "",
  "conv_block_dims": "",
  "conv_dim": 100000,
  "dataset_config": "",
  "debiased_estimation_loss": false,
  "decompose_both": false,
  "dim_from_weights": false,
  "dora_wd": false,
  "down_lr_weight": "",
  "dynamo_backend": "no",
  "dynamo_mode": "default",
  "dynamo_use_dynamic": false,
  "dynamo_use_fullgraph": false,
  "enable_bucket": true,
  "epoch": 25,
  "extra_accelerate_launch_args": "",
  "factor": 6,
  "flip_aug": false,
  "fp8_base": false,
  "full_bf16": false,
  "full_fp16": false,
  "gpu_ids": "",
  "gradient_accumulation_steps": 1,
  "gradient_checkpointing": true,
  "huber_c": 0.1,
  "huber_schedule": "snr",
  "huggingface_path_in_repo": "",
  "huggingface_repo_id": "",
  "huggingface_repo_type": "",
  "huggingface_repo_visibility": "",
  "huggingface_token": "",
  "ip_noise_gamma": 0,
  "ip_noise_gamma_random_strength": false,
  "keep_tokens": 4,
  "learning_rate": 1,
  "log_config": false,
  "log_tracker_config": "",
  "log_tracker_name": "",
  "log_with": "",
  "logging_dir": "",
  "loraplus_lr_ratio": 0,
  "loraplus_text_encoder_lr_ratio": 0,
  "loraplus_unet_lr_ratio": 0,
  "loss_type": "l2",
  "lr_scheduler": "cosine",
  "lr_scheduler_args": "",
  "lr_scheduler_num_cycles": 1,
  "lr_scheduler_power": 1,
  "lr_scheduler_type": "",
  "lr_warmup": 0,
  "main_process_port": 0,
  "masked_loss": false,
  "max_bucket_reso": 1536,
  "max_data_loader_n_workers": 8,
  "max_grad_norm": 1,
  "max_resolution": "1024,1024",
  "max_timestep": 1000,
  "max_token_length": 225,
  "max_train_epochs": 0,
  "max_train_steps": 0,
  "mem_eff_attn": false,
  "metadata_author": "",
  "metadata_description": "",
  "metadata_license": "",
  "metadata_tags": "",
  "metadata_title": "",
  "mid_lr_weight": "",
  "min_bucket_reso": 512,
  "min_snr_gamma": 5,
  "min_timestep": 0,
  "mixed_precision": "bf16",
  "model_list": "custom",
  "module_dropout": 0.1,
  "multi_gpu": false,
  "multires_noise_discount": 0,
  "multires_noise_iterations": 0,
  "network_alpha": 1,
  "network_dim": 100000,
  "network_dropout": 0,
  "network_weights": "",
  "noise_offset": 0,
  "noise_offset_random_strength": false,
  "noise_offset_type": "Multires",
  "num_cpu_threads_per_process": 2,
  "num_machines": 1,
  "num_processes": 1,
  "optimizer": "Prodigy",
  "optimizer_args": "decouple=True weight_decay=0.5 d_coef=0.2 use_bias_correction=True betas=(0.9,0.99) safeguard_warmup=False d0=1e-05",
  "output_dir": "C:\\SD\\loraS\\girl5\\model",
  "output_name": "girl5",
  "persistent_data_loader_workers": true,
  "pretrained_model_name_or_path": "C:/SD/webui/models/Stable-diffusion/pnyv6.safetensors",
  "prior_loss_weight": 1,
  "random_crop": false,
  "rank_dropout": 0.35,
  "rank_dropout_scale": false,
  "reg_data_dir": "",
  "rescaled": false,
  "resume": "",
  "resume_from_huggingface": "",
  "sample_every_n_epochs": 0,
  "sample_every_n_steps": 0,
  "sample_prompts": "",
  "sample_sampler": "euler_a",
  "save_every_n_epochs": 5,
  "save_every_n_steps": 0,
  "save_last_n_steps": 0,
  "save_last_n_steps_state": 0,
  "save_model_as": "safetensors",
  "save_precision": "bf16",
  "save_state": false,
  "save_state_on_train_end": false,
  "save_state_to_huggingface": false,
  "scale_v_pred_loss_like_noise_pred": false,
  "scale_weight_norms": 0,
  "sdxl": true,
  "sdxl_cache_text_encoder_outputs": false,
  "sdxl_no_half_vae": true,
  "seed": 0,
  "shuffle_caption": true,
  "stop_text_encoder_training_pct": 0,
  "text_encoder_lr": 1,
  "train_batch_size": 2,
  "train_data_dir": "C:\\SD\\loraS\\girl5\\img",
  "train_norm": false,
  "train_on_input": true,
  "training_comment": "",
  "unet_lr": 1,
  "unit": 1,
  "up_lr_weight": "",
  "use_cp": false,
  "use_scalar": false,
  "use_tucker": false,
  "v2": false,
  "v_parameterization": false,
  "v_pred_like_loss": 0,
  "vae": "",
  "vae_batch_size": 0,
  "wandb_api_key": "",
  "wandb_run_name": "",
  "weighted_captions": false,
  "xformers": "xformers"
}

what could be the problem?

BootsofLagrangian commented 3 months ago

I recommend using 'factor' as a power of 2. Using factor=6 could cause problems because it is a multiple of 3.

xandrmoro commented 1 week ago

@Verun11 have you got any luck in fixing this? I tried all kinds of settings, and its still the same error