hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
599 stars 87 forks source link

CUDA backend failed #82

Closed gado01 closed 7 months ago

gado01 commented 8 months ago

This error was already mentioned but so far I can't find a solution. Could someone help me how to fix this please. I already tried the updated Lora_Trainer and the problem persists.

CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Loading settings from /content/drive/MyDrive/Loras/hijab/training_config.toml... /content/drive/MyDrive/Loras/hijab/training_config prepare tokenizer vocab.json: 100% 961k/961k [00:00<00:00, 4.31MB/s] merges.txt: 100% 525k/525k [00:00<00:00, 3.13MB/s] special_tokens_map.json: 100% 389/389 [00:00<00:00, 2.01MB/s] tokenizer_config.json: 100% 905/905 [00:00<00:00, 5.12MB/s] update token length: 225 Loading dataset config from /content/drive/MyDrive/Loras/hijab/dataset_config.toml prepare images. found directory /content/drive/MyDrive/Loras/hijab/dataset contains 682 image files neither caption file nor class tokens are found. use empty caption for /content/drive/MyDrive/Loras/hijab/dataset/1704243855490.png.jpg / キャプションファイルもclass tokenも見つかりませんでした。空のキャプションを使用します: /content/drive/MyDrive/Loras/hijab/dataset/1704243855490.png.jpg No caption file found for 1 images. Training will continue without captions for these images. If class token exists, it will be used. / 1枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 /content/drive/MyDrive/Loras/hijab/dataset/1704243855490.png.jpg 2046 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 320 max_bucket_reso: 1280 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "/content/drive/MyDrive/Loras/hijab/dataset" image_count: 682 num_repeats: 3 shuffle_caption: True keep_tokens: 2 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: None caption_extension: .txt

[Dataset 0] loading image sizes. 1% 6/682 [00:00<00:07, 90.53it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /content/kohya-trainer/train_network.py:873 in │ │ │ │ 870 │ args = parser.parse_args() │ │ 871 │ args = train_util.read_config_from_file(args, parser) │ │ 872 │ │ │ ❱ 873 │ train(args) │ │ 874 │ │ │ │ /content/kohya-trainer/train_network.py:135 in train │ │ │ │ 132 │ │ │ │ } │ │ 133 │ │ │ │ 134 │ │ blueprint = blueprint_generator.generate(user_config, args, tokenizer=tokenizer) │ │ ❱ 135 │ │ train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint. │ │ 136 │ else: │ │ 137 │ │ # use arbitrary dataset class │ │ 138 │ │ train_dataset_group = train_util.load_arbitrary_dataset(args, tokenizer) │ │ │ │ /content/kohya-trainer/library/config_util.py:436 in generate_dataset_group_by_blueprint │ │ │ │ 433 seed = random.randint(0, 2**31) # actual seed is seed + epoch_no │ │ 434 for i, dataset in enumerate(datasets): │ │ 435 │ print(f"[Dataset {i}]") │ │ ❱ 436 │ dataset.make_buckets() │ │ 437 │ dataset.set_seed(seed) │ │ 438 │ │ 439 return DatasetGroup(datasets) │ │ │ │ /content/kohya-trainer/library/train_util.py:607 in make_buckets │ │ │ │ 604 │ │ print("loading image sizes.") │ │ 605 │ │ for info in tqdm(self.image_data.values()): │ │ 606 │ │ │ if info.image_size is None: │ │ ❱ 607 │ │ │ │ info.image_size = self.get_image_size(info.absolute_path) │ │ 608 │ │ │ │ 609 │ │ if self.enable_bucket: │ │ 610 │ │ │ print("make buckets") │ │ │ │ /content/kohya-trainer/library/train_util.py:833 in get_image_size │ │ │ │ 830 │ │ │ │ │ │ info.latents_flipped = latent │ │ 831 │ │ │ 832 │ def get_image_size(self, image_path): │ │ ❱ 833 │ │ image = Image.open(image_path) │ │ 834 │ │ return image.size │ │ 835 │ │ │ 836 │ def load_image_with_face_info(self, subset: BaseSubset, image_path: str): │ │ │ │ /usr/local/lib/python3.10/dist-packages/PIL/Image.py:3283 in open │ │ │ │ 3280 │ for message in accept_warnings: │ │ 3281 │ │ warnings.warn(message) │ │ 3282 │ msg = "cannot identify image file %r" % (filename if filename else fp) │ │ ❱ 3283 │ raise UnidentifiedImageError(msg) │ │ 3284 │ │ 3285 │ │ 3286 # │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ UnidentifiedImageError: cannot identify image file '/content/drive/MyDrive/Loras/hijab/dataset/1704243855490.png.jpg' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/hijab/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/hijab/training_config.toml']' returned non-zero exit status 1.

hollowstrawberry commented 7 months ago

The error is:

UnidentifiedImageError: cannot identify image file '/content/drive/MyDrive/Loras/hijab/dataset/1704243855490.png.jpg'

As such I think it was a problem with your dataset. Sorry I didn't respond earlier.