kohya-ss / sd-scripts

Apache License 2.0
5.26k stars 874 forks source link

Enabling dim_from_weights or loraplus_unet_lr_ratio will cause the error: "train_blocks must be single for split mode" (content updated). #1720

Open avan06 opened 3 weeks ago

avan06 commented 3 weeks ago

Hi,

Today, when I was running LoRA training for the Flux.1 model (sd-scripts on SD3's breach), the "train_blocks must be single for split mode" error suddenly occurred. This error had not appeared before. After reviewing the parameter settings, I finally found the cause.

``` F:\kohya_ss\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) attn_output = torch.nn.functional.scaled_dot_product_attention( Traceback (most recent call last): File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in trainer.train(args) File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target( File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target model_pred = call_dit( File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit assert network.train_blocks == "single", "train_blocks must be single for split mode" AssertionError: train_blocks must be single for split mode ```

The issue was that I specified both the "network_weights" and "dim_from_weights" parameters. Once I disabled the "dim_from_weights" parameter, everything worked fine again.

I wonder if anyone else has encountered the same issue. Could it be that dim_from_weights retrieves double blocks, causing the split mode mechanism to malfunction?

avan06 commented 2 days ago

Today, I tested several parameter settings again and found that whenever "train_blocks": "single" is set, adding --network_args "loraplus_unet_lr_ratio=4" also triggers the error message: AssertionError: train_blocks must be single for split mode.

``` Traceback (most recent call last): File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in trainer.train(args) File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target( File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target model_pred = call_dit( File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit assert network.train_blocks == "single", "train_blocks must be single for split mode" AssertionError: train_blocks must be single for split mode steps: 0%| | 0/10960 [01:14 sys.exit(main()) File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\\kohya_ss\\venv\\Scripts\\python.exe', 'F:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'F:/model/config_lora-20241113-000607.toml', '--network_args', 'loraplus_unet_lr_ratio=4']' returned non-zero exit status 1. 00:21:00-078513 INFO Training has ended. ```