is the lora trainer 512 error again ? CUDA backend failed to initialize: Found CUDA version 12010,

ridhoyp commented 6 months ago

is it error again?

MyDrive/Loras/test_lora/dataset 📈 Found 95 images with 2 repeats, equaling 190 steps. 📉 Divide 190 steps by 2 batch size to get 95.0 steps per epoch. 🔮 There will be 10 epochs, for around 950 total training steps.

✅ Dependencies already installed.

🔄 Model already downloaded.

📄 Config saved to /content/drive/MyDrive/Loras/test_lora/training_config.toml 📄 Dataset config saved to /content/drive/MyDrive/Loras/test_lora/dataset_config.toml

⭐ Starting trainer...

CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Loading settings from /content/drive/MyDrive/Loras/test_lora/training_config.toml... /content/drive/MyDrive/Loras/test_lora/training_config prepare tokenizer update token length: 225 Loading dataset config from /content/drive/MyDrive/Loras/testl_lora/dataset_config.toml prepare images. found directory /content/drive/MyDrive/Loras/test_lora/dataset contains 95 image files 190 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 320 max_bucket_reso: 1280 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "/content/drive/MyDrive/Loras/test_lora/dataset" image_count: 95 num_repeats: 2 shuffle_caption: True keep_tokens: 1 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: True face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: None caption_extension: .txt

[Dataset 0] loading image sizes. 100% 95/95 [00:00<00:00, 408.66it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (704, 1280), count: 4 bucket 1: resolution (768, 1280), count: 2 bucket 2: resolution (832, 1216), count: 34 bucket 3: resolution (896, 1152), count: 18 bucket 4: resolution (1024, 1024), count: 100 bucket 5: resolution (1152, 896), count: 4 bucket 6: resolution (1216, 832), count: 24 bucket 7: resolution (1280, 704), count: 4 mean ar error (without repeats): 0.012814058038616015 preparing accelerator

/content/kohya-trainer/train_network.py:991 in │ │ │ │ 988 │ args = train_util.read_config_from_file(args, parser) │ │ 989 │ │ │ 990 │ trainer = NetworkTrainer() │ │ ❱ 991 │ trainer.train(args) │ │ 992 │ │ │ │ /content/kohya-trainer/train_network.py:205 in train │ │ │ │ 202 │ │ │ │ 203 │ │ # acceleratorを準備する │ │ 204 │ │ print("preparing accelerator") │ │ ❱ 205 │ │ accelerator = train_util.prepare_accelerator(args) │ │ 206 │ │ is_main_process = accelerator.is_main_process │ │ 207 │ │ │ │ 208 │ │ # mixed precisionに対応した型を用意しておき適宜castする │ │ │ │ /content/kohya-trainer/library/train_util.py:3569 in prepare_accelerator │ │ │ │ 3566 │ │ │ if args.wandb_api_key is not None: │ │ 3567 │ │ │ │ wandb.login(key=args.wandb_api_key) │ │ 3568 │ │ │ ❱ 3569 │ accelerator = Accelerator( │ │ 3570 │ │ gradient_accumulation_steps=args.gradient_accumulation_steps, │ │ 3571 │ │ mixed_precision=args.mixed_precision, │ │ 3572 │ │ log_with=log_with, │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: Accelerator.init() got an unexpected keyword argument 'project_dir' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args):

CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/test_lora/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/test_lora/training_config.toml']' returned non-zero exit status 1.

hollowstrawberry commented 6 months ago

I believe this happened while I was changing things trying to find a fix. The current problem appears to be #98 still.

ridhoyp commented 6 months ago

alright, thankyou for your hardworks.. :)

hollowstrawberry / kohya-colab

is the lora trainer 512 error again ? CUDA backend failed to initialize: Found CUDA version 12010, #102