kohya-ss / sd-scripts

Apache License 2.0
5.12k stars 853 forks source link

error running sdxl_train_network.py in WSL #1069

Open antrobot1234 opened 9 months ago

antrobot1234 commented 9 months ago

i am trying to run sd_scripts in windows subsystem for linux because windows doesn't have support for all the optimization libraries, but the sdxl_train_network.py command does not work for me. here is the error that i get when i try to run it

prepare tokenizers
update token length: 225
Using DreamBooth method.
Traceback (most recent call last):
  File "/home/antrobot/sd-scripts/./sdxl_train_network.py", line 184, in <module>
    trainer.train(args)
  File "/home/antrobot/sd-scripts/train_network.py", line 193, in train
    train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group)
  File "/home/antrobot/sd-scripts/library/config_util.py", line 460, in generate_dataset_group_by_blueprint
    dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params))
  File "/home/antrobot/sd-scripts/library/train_util.py", line 1771, in __init__
    self.dreambooth_dataset_delegate = DreamBoothDataset(
TypeError: DreamBoothDataset.__init__() missing 1 required positional argument: 'debug_dataset'
Traceback (most recent call last):
  File "/home/antrobot/sd-scripts/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/antrobot/sd-scripts/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/antrobot/sd-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "/home/antrobot/sd-scripts/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/antrobot/sd-scripts/venv/bin/python3', './sdxl_train_network.py', '--logging_dir=logs', '--log_prefix=Welwraith', '--network_module=networks.lora', '--max_data_loader_n_workers=1', '--persistent_data_loader_workers', '--caption_extension=.txt', '--shuffle_caption', '--keep_tokens=1', '--max_token_length=225', '--prior_loss_weight=1', '--mixed_precision=bf16', '--save_precision=bf16', '--xformers', '--cache_latents', '--cache_latents_to_disk', '--save_model_as=safetensors', '--train_data_dir=./.image-dir/', '--output_dir=./.output/Welwraith_1', '--reg_data_dir=./.reg-dir/', '--pretrained_model_name_or_path=/mnt/d/Stable Diffusion Projects/stable-diffusion-webui/models/Stable-diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors', '--output_name=Welwraith_1', '--learning_rate=5E-05', '--text_encoder_lr=5E-05', '--max_train_steps=2000', '--resolution=768', '--enable_bucket', '--min_bucket_reso=576', '--max_bucket_reso=960', '--train_batch_size=1', '--network_dim=64', '--network_alpha=64', '--optimizer_type=AdamW8Bit', '--lr_scheduler=cosine', '--noise_offset=0.01', '--seed=0', '--sample_sampler=k_euler_a', '--gradient_accumulation_steps=1', '--save_every_n_steps=400', '--sample_every_n_steps=200', '--sample_prompts=./.output/prompt.txt', '--gradient_checkpointing', '--flip_aug', '--network_dropout=0.3', '--scale_weight_norms=3', '--min_snr_gamma=5', '--fp8_base', '--optimizer_args', 'betas=0.9, 0.99', 'weight_decay=0']' returned non-zero exit status 1.
kohya-ss commented 9 months ago

Please update the repository by following the update procedure in README.md. I think pip install -e . may solve the issue.

antrobot1234 commented 9 months ago

@kohya-ss This is a fresh install. I tried running both the update code and the code you suggested and neither did anything.

(edit: I have completely uninstalled the project and followed the installation steps to reinstall it. i am still getting the same issue)

antrobot1234 commented 9 months ago

train_SDXL.toml.txt

train_SDXL.sh.txt

here is my training config and my bash script to run it

kohya-ss commented 8 months ago

It seems that you are training LoRA on a dataset for ControlNet. I found a bug in ControlNet-LLLite training, and fixed it in dev branch. Thank you for reporting this.

However, I do not know why your dataset is recognized as ControlNet dataset. In the meantime, please try training again.