Starting trainer issue - Githubissues

stadiff0001 commented 1 month ago

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network_wrapper.py', '--dataset_config=/content/drive/MyDrive/Loras/Mech2/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/Mech2/training_config.toml']' died with <Signals.SIGSEGV: 11>.

abigfan1337 commented 1 month ago

same

stadiffs commented 1 month ago

same thing

githubnoot commented 1 month ago

Big ol' same here, too.

stadiffs commented 1 month ago

⭐ Starting trainer...

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network_wrapper.py', '--dataset_config=/content/drive/MyDrive/Loras/Lora1234567890/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/Lora1234567890/training_config.toml']' died with <Signals.SIGSEGV: 11>.

takoyariika commented 1 month ago

same

ManuelMultiverse commented 1 month ago

same issue here

matcordero commented 1 month ago

I think it's a bug in the dependencies since colab updated them 3 days ago, changing this seems to work (I only tested it in trainer xl).

!pip install torch==2.4.1+cu121 accelerate==0.32.1 transformers==4.42.4 diffusers==0.18.2 bitsandbytes==0.40.0.post4 opencv-python==4.7.0.68 jax==0.4.23 jaxlib==0.4.23 !pip install pytorch-lightning==1.9.0 voluptuous= =0.13.1 toml==0.10.2 ftfy==6.1.1 einops==0.6.0 safetensors pygments !pip install huggingface-hub invisible-watermark>=2.0 open-clip-torch==2.20.0 dadaptation==3.1 prodigyopt==1.0 lion-pytorch==0.1.2 wandb !pip install -e .

githubnoot commented 1 month ago

Thanks for the info!

XL is working for me, too (not Pony) and then the normal Lora Trainer is still giving the issues like in this thread.

uYouUs commented 1 month ago

Temp fix: https://colab.research.google.com/github/uYouUs/Hollowstrawberry-kohya-colab/blob/Experiments/Lora_Trainer.ipynb

pichelsteiner commented 1 month ago

Temp fix: https://colab.research.google.com/github/uYouUs/Hollowstrawberry-kohya-colab/blob/Experiments/Lora_Trainer.ipynb

Works perfectly. Thank you for this!

stadiffs commented 1 month ago

Thank you for the fix, I'm getting good results as always.

hollowstrawberry / kohya-colab

Starting trainer issue #215