hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
561 stars 80 forks source link

Command died with SIGKILL #106

Open Bellatrix8 opened 3 months ago

Bellatrix8 commented 3 months ago

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:918 in launch_command │ │ │ │ 915 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 916 │ │ sagemaker_launcher(defaults, args) │ │ 917 │ else: │ │ ❱ 918 │ │ simple_launcher(args) │ │ 919 │ │ 920 │ │ 921 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:580 in simple_launcher │ │ │ │ 577 │ process.wait() │ │ 578 │ if process.returncode != 0: │ │ 579 │ │ if not args.quiet: │ │ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 581 │ │ else: │ │ 582 │ │ │ sys.exit(1) │ │ 583 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'sdxl_train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/xl_test/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/xl_test/training_config.toml']' died with <Signals.SIGKILL: 9>.

hollowstrawberry commented 3 months ago

sigkill means it ran out of memory. You should tweak some values to let it run better.

Bellatrix8 commented 3 months ago

sigkill means it ran out of memory. You should tweak some values to let it run better.

I've left all the settings by default. I'm not sure what could be changed

Bellatrix8 commented 3 months ago

sigkill means it ran out of memory. You should tweak some values to let it run better.

Could it be due to the dataset? I use ~400 images

hollowstrawberry commented 3 months ago

sigkill means it ran out of memory. You should tweak some values to let it run better.

Could it be due to the dataset? I use ~400 images

You should try with fewer images, but 400 isn't that many. If you're on the XL trainer, make sure you enable caching to drive.