Problem with xformer and cuda

📂 Connecting to Google Drive... Mounted at /content/drive

💿 Checking dataset... 📁MyDrive/Loras/Charles_Lora/dataset 📈 Found 12 images with 10 repeats, equaling 120 steps. 📉 Divide 120 steps by 2 batch size to get 60.0 steps per epoch. 🔮 There will be 10 epochs, for around 600 total training steps.

🏭 Installing dependencies...

Cloning into '/content/kohya-trainer'... remote: Enumerating objects: 5909, done. remote: Counting objects: 100% (5909/5909), done. remote: Compressing objects: 100% (1873/1873), done. remote: Total 5909 (delta 4201), reused 5591 (delta 4024), pack-reused 0 Receiving objects: 100% (5909/5909), 9.23 MiB | 7.85 MiB/s, done. Resolving deltas: 100% (4201/4201), done. HEAD is now at e6ad3cb Merge pull request #478 from rockerBOO/patch-1 39 packages can be upgraded. Run 'apt list --upgradable' to see them. The following additional packages will be installed: libaria2-0 libc-ares2 The following NEW packages will be installed: aria2 libaria2-0 libc-ares2 0 upgraded, 3 newly installed, 0 to remove and 39 not upgraded. Need to get 1,513 kB of archives. After this operation, 5,441 kB of additional disk space will be used. Selecting previously unselected package libc-ares2:amd64. (Reading database ... 121749 files and directories currently installed.) Preparing to unpack .../libc-ares2_1.18.1-1ubuntu0.22.04.3_amd64.deb ... Unpacking libc-ares2:amd64 (1.18.1-1ubuntu0.22.04.3) ... Selecting previously unselected package libaria2-0:amd64. Preparing to unpack .../libaria2-0_1.36.0-1_amd64.deb ... Unpacking libaria2-0:amd64 (1.36.0-1) ... Selecting previously unselected package aria2. Preparing to unpack .../aria2_1.36.0-1_amd64.deb ... Unpacking aria2 (1.36.0-1) ... Setting up libc-ares2:amd64 (1.18.1-1ubuntu0.22.04.3) ... Setting up libaria2-0:amd64 (1.36.0-1) ... Setting up aria2 (1.36.0-1) ... Processing triggers for man-db (2.10.2-1) ... Processing triggers for libc-bin (2.35-0ubuntu3.4) ... /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

Preparing metadata (setup.py) ... done ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 191.5/191.5 kB 5.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 64.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 6.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.6/41.6 kB 4.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 503.1/503.1 kB 41.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 825.8/825.8 kB 46.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.6/92.6 MB 9.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 489.8/489.8 MB 3.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 73.8 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Preparing metadata (setup.py) ... done ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 6.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 99.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 840.4/840.4 kB 62.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 103.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 440.7/440.7 kB 43.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 86.7 MB/s eta 0:00:00 Building wheel for dadaptation (pyproject.toml) ... done Building wheel for lycoris_lora (setup.py) ... done Building wheel for library (setup.py) ... done

✅ Installation finished in 149 seconds.

🔄 Downloading model...

Status Legend: (OK):download completed.

📄 Config saved to /content/drive/MyDrive/Loras/Charles_Lora/training_config.toml 📄 Dataset config saved to /content/drive/MyDrive/Loras/Charles_Lora/dataset_config.toml

⭐ Starting trainer...

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.0+cu118 with CUDA 1108 (you have 2.1.0+cu121) Python 3.10.13 (you have 3.10.12) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Loading settings from /content/drive/MyDrive/Loras/Charles_Lora/training_config.toml... /content/drive/MyDrive/Loras/Charles_Lora/training_config prepare tokenizer vocab.json: 100% 961k/961k [00:00<00:00, 62.5MB/s] merges.txt: 100% 525k/525k [00:00<00:00, 1.82MB/s] special_tokens_map.json: 100% 389/389 [00:00<00:00, 2.24MB/s] tokenizer_config.json: 100% 905/905 [00:00<00:00, 4.10MB/s] update token length: 225 Load dataset config from /content/drive/MyDrive/Loras/Charles_Lora/dataset_config.toml prepare images. found directory /content/drive/MyDrive/Loras/Charles_Lora/dataset contains 12 image files 120 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "/content/drive/MyDrive/Loras/Charles_Lora/dataset" image_count: 12 num_repeats: 10 shuffle_caption: True keep_tokens: 1 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: True face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: None caption_extension: .txt

[Dataset 0] loading image sizes. 100% 12/12 [00:10<00:00, 1.10it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 120 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load StableDiffusion checkpoint loading u-net: loading vae: config.json: 100% 4.52k/4.52k [00:00<00:00, 16.4MB/s] model.safetensors: 100% 1.71G/1.71G [00:11<00:00, 144MB/s] loading text encoder: Replace CrossAttention.forward to use xformers [Dataset 0] caching latents. 100% 12/12 [00:03<00:00, 3.43it/s] import network module: networks.lora create LoRA network. base dim (rank): 16, alpha: 8 create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use 8-bit AdamW optimizer | {} override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 600 running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 120 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 60 num epochs / epoch数: 10 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 600 steps: 0% 0/600 [00:00<?, ?it/s]epoch 1/10 ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /content/kohya-trainer/train_network.py:773 in │ │ │ │ 770 │ args = parser.parse_args() │ │ 771 │ args = train_util.read_config_from_file(args, parser) │ │ 772 │ │ │ ❱ 773 │ train(args) │ │ 774 │ │ │ │ /content/kohya-trainer/train_network.py:605 in train │ │ │ │ 602 │ │ │ │ │ │ 603 │ │ │ │ # Predict the noise residual │ │ 604 │ │ │ │ with accelerator.autocast(): │ │ ❱ 605 │ │ │ │ │ noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).s │ │ 606 │ │ │ │ │ │ 607 │ │ │ │ if args.v_parameterization: │ │ 608 │ │ │ │ │ # v-parameterization training │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl │ │ │ │ 1515 │ │ if self._compiled_call_impl is not None: │ │ 1516 │ │ │ return self._compiled_call_impl(args, kwargs) # type: ignore[misc] │ │ 1517 │ │ else: │ │ ❱ 1518 │ │ │ return self._call_impl(*args, *kwargs) │ │ 1519 │ │ │ 1520 │ def _call_impl(self, args, kwargs): │ │ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527 in _call_impl │ │ │ │ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1527 │ │ │ return forward_call(*args, kwargs) │ │ 1528 │ │ │ │ 1529 │ │ try: │ │ 1530 │ │ │ result = None │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py:490 in call │ │ │ │ 487 │ │ update_wrapper(self, model_forward) │ │ 488 │ │ │ 489 │ def call(self, *args, *kwargs): │ │ ❱ 490 │ │ return convert_to_fp32(self.model_forward(args, kwargs)) │ │ 491 │ │ │ 492 │ def getstate(self): │ │ 493 │ │ raise pickle.PicklingError( │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py:16 in decorate_autocast │ │ │ │ 13 │ @functools.wraps(func) │ │ 14 │ def decorate_autocast(*args, kwargs): │ │ 15 │ │ with autocast_instance: │ │ ❱ 16 │ │ │ return func(*args, *kwargs) │ │ 17 │ │ │ 18 │ decorate_autocast.__script_unsupported = "@autocast() decorator is not supported in │ │ 19 │ return decorate_autocast │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_condition.py:381 in forward │ │ │ │ 378 │ │ down_block_res_samples = (sample,) │ │ 379 │ │ for downsample_block in self.down_blocks: │ │ 380 │ │ │ if hasattr(downsample_block, "has_cross_attention") and downsample_block.has │ │ ❱ 381 │ │ │ │ sample, res_samples = downsample_block( │ │ 382 │ │ │ │ │ hidden_states=sample, │ │ 383 │ │ │ │ │ temb=emb, │ │ 384 │ │ │ │ │ encoder_hidden_states=encoder_hidden_states, │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl │ │ │ │ 1515 │ │ if self._compiled_call_impl is not None: │ │ 1516 │ │ │ return self._compiled_call_impl(args, kwargs) # type: ignore[misc] │ │ 1517 │ │ else: │ │ ❱ 1518 │ │ │ return self._call_impl(*args, kwargs) │ │ 1519 │ │ │ 1520 │ def _call_impl(self, *args, *kwargs): │ │ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527 in _call_impl │ │ │ │ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1527 │ │ │ return forward_call(args, kwargs) │ │ 1528 │ │ │ │ 1529 │ │ try: │ │ 1530 │ │ │ result = None │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_blocks.py:612 in forward │ │ │ │ 609 │ │ │ │ )[0] │ │ 610 │ │ │ else: │ │ 611 │ │ │ │ hidden_states = resnet(hidden_states, temb) │ │ ❱ 612 │ │ │ │ hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden │ │ 613 │ │ │ │ │ 614 │ │ │ output_states += (hidden_states,) │ │ 615 │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl │ │ │ │ 1515 │ │ if self._compiled_call_impl is not None: │ │ 1516 │ │ │ return self._compiled_call_impl(*args, kwargs) # type: ignore[misc] │ │ 1517 │ │ else: │ │ ❱ 1518 │ │ │ return self._call_impl(*args, *kwargs) │ │ 1519 │ │ │ 1520 │ def _call_impl(self, args, kwargs): │ │ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527 in _call_impl │ │ │ │ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1527 │ │ │ return forward_call(*args, kwargs) │ │ 1528 │ │ │ │ 1529 │ │ try: │ │ 1530 │ │ │ result = None │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/models/attention.py:216 in forward │ │ │ │ 213 │ │ │ │ 214 │ │ # 2. Blocks │ │ 215 │ │ for block in self.transformer_blocks: │ │ ❱ 216 │ │ │ hidden_states = block(hidden_states, context=encoder_hidden_states, timestep │ │ 217 │ │ │ │ 218 │ │ # 3. Output │ │ 219 │ │ if self.is_input_continuous: │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl │ │ │ │ 1515 │ │ if self._compiled_call_impl is not None: │ │ 1516 │ │ │ return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] │ │ 1517 │ │ else: │ │ ❱ 1518 │ │ │ return self._call_impl(args, kwargs) │ │ 1519 │ │ │ 1520 │ def _call_impl(self, *args, kwargs): │ │ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527 in _call_impl │ │ │ │ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1527 │ │ │ return forward_call(*args, *kwargs) │ │ 1528 │ │ │ │ 1529 │ │ try: │ │ 1530 │ │ │ result = None │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/models/attention.py:484 in forward │ │ │ │ 481 │ │ if self.only_cross_attention: │ │ 482 │ │ │ hidden_states = self.attn1(norm_hidden_states, context) + hidden_states │ │ 483 │ │ else: │ │ ❱ 484 │ │ │ hidden_states = self.attn1(norm_hidden_states) + hidden_states │ │ 485 │ │ │ │ 486 │ │ if self.attn2 is not None: │ │ 487 │ │ │ # 2. Cross-Attention │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl │ │ │ │ 1515 │ │ if self._compiled_call_impl is not None: │ │ 1516 │ │ │ return self._compiled_call_impl(args, kwargs) # type: ignore[misc] │ │ 1517 │ │ else: │ │ ❱ 1518 │ │ │ return self._call_impl(*args, kwargs) │ │ 1519 │ │ │ 1520 │ def _call_impl(self, *args, *kwargs): │ │ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527 in _call_impl │ │ │ │ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1527 │ │ │ return forward_call(args, kwargs) │ │ 1528 │ │ │ │ 1529 │ │ try: │ │ 1530 │ │ │ result = None │ │ │ │ /content/kohya-trainer/library/train_util.py:1846 in forward_xformers │ │ │ │ 1843 │ │ q = q.contiguous() │ │ 1844 │ │ k = k.contiguous() │ │ 1845 │ │ v = v.contiguous() │ │ ❱ 1846 │ │ out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適な │ │ 1847 │ │ │ │ 1848 │ │ out = rearrange(out, "b n h d -> b n (h d)", h=h) │ │ 1849 │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py:223 in │ │ memory_efficient_attention │ │ │ │ 220 │ │ and options. │ │ 221 │ :return: multi-head attention Tensor with shape [B, Mq, H, Kv] │ │ 222 │ """ │ │ ❱ 223 │ return _memory_efficient_attention( │ │ 224 │ │ Inputs( │ │ 225 │ │ │ query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=scale │ │ 226 │ │ ), │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py:326 in │ │ _memory_efficient_attention │ │ │ │ 323 │ │ ) │ │ 324 │ │ │ 325 │ output_shape = inp.normalize_bmhk() │ │ ❱ 326 │ return _fMHA.apply( │ │ 327 │ │ op, inp.query, inp.key, inp.value, inp.attn_bias, inp.p, inp.scale │ │ 328 │ ).reshape(output_shape) │ │ 329 │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/autograd/function.py:539 in apply │ │ │ │ 536 │ │ if not torch._C._are_functorch_transforms_active(): │ │ 537 │ │ │ # See NOTE: [functorch vjp and autograd interaction] │ │ 538 │ │ │ args = _functorch.utils.unwrap_dead_wrappers(args) │ │ ❱ 539 │ │ │ return super().apply(*args, **kwargs) # type: ignore[misc] │ │ 540 │ │ │ │ 541 │ │ if cls.setup_context == _SingleLevelFunction.setup_context: │ │ 542 │ │ │ raise RuntimeError( │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py:42 in forward │ │ │ │ 39 │ │ op_fw = op[0] if op is not None else None │ │ 40 │ │ op_bw = op[1] if op is not None else None │ │ 41 │ │ │ │ ❱ 42 │ │ out, op_ctx = _memory_efficient_attention_forward_requires_grad( │ │ 43 │ │ │ inp=inp, op=op_fw │ │ 44 │ │ ) │ │ 45 │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py:351 in │ │ _memory_efficient_attention_forward_requires_grad │ │ │ │ 348 │ inp.validate_inputs() │ │ 349 │ output_shape = inp.normalize_bmhk() │ │ 350 │ if op is None: │ │ ❱ 351 │ │ op = _dispatch_fw(inp, True) │ │ 352 │ else: │ │ 353 │ │ _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op, inp) │ │ 354 │ out = op.apply(inp, needs_gradient=True) │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/dispatch.py:120 in _dispatch_fw │ │ │ │ 117 │ Returns: │ │ 118 │ │ AttentionOp: The best operator for the configuration │ │ 119 │ """ │ │ ❱ 120 │ return _run_priority_list( │ │ 121 │ │ "memory_efficient_attention_forward", │ │ 122 │ │ _dispatch_fw_priority_list(inp, needs_gradient), │ │ 123 │ │ inp, │ │ │ │ /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/dispatch.py:63 in _run_priority_list │ │ │ │ 60 {textwrap.indent(_format_inputs_description(inp), ' ')}""" │ │ 61 │ for op, not_supported in zip(priority_list, not_supported_reasons): │ │ 62 │ │ msg += "\n" + _format_not_supported_reasons(op, not_supported) │ │ ❱ 63 │ raise NotImplementedError(msg) │ │ 64 │ │ 65 │ │ 66 def _dispatch_fw_priority_list( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(2, 4096, 8, 40) (torch.float16) key : shape=(2, 4096, 8, 40) (torch.float16) value : shape=(2, 4096, 8, 40) (torch.float16) attn_bias : <class 'NoneType'> p : 0.0 flshattF@0.0.0 is not supported because: xFormers wasn't build with CUDA support requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) operator wasn't built - see python -m xformers.info for more info tritonflashattF is not supported because: xFormers wasn't build with CUDA support requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) operator wasn't built - see python -m xformers.info for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4 Only work on pre-MLIR triton for now cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support dtype=torch.float16 (supported: {torch.float32}) operator wasn't built - see python -m xformers.info for more info unsupported embed per head: 40 steps: 0% 0/600 [00:03<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/Charles_Lora/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/Charles_Lora/training_config.toml']' returned non-zero exit status 1.

hollowstrawberry / kohya-colab

Problem with xformer and cuda #89