hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
561 stars 80 forks source link

Issue with xfromers, and thus running into a CUDA memory issue #76

Closed fuzzballb closed 4 months ago

fuzzballb commented 6 months ago

I see that the xfromers can't be loaded and eventually the notebook crashes. When searching for the 'CalledProcessError: Command '['/usr/bin/python3', 'train_network.py'' error i saw that this is probebly a CUDA out of memory error

Inspecting the output i saw this xFormers warning. I have tried to cone the repo and tested other xFormer versions, but that didn't fix the issue.

ps I am using bought colab cedits and don't have a subscription.

Do you now how to fix this xformers issue and hopefully the related cuda memory issue?

⭐ Starting trainer...

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.0+cu118 with CUDA 1108 (you have 2.1.0+cu121) Python 3.10.13 (you have 3.10.12) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/Jansen_Lora/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/Jansen_Lora/training_config.toml']' returned non-zero exit status 1.

If i disable xFormers completely, i get the following error

OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 63.06 MiB is free. ...

hollowstrawberry commented 6 months ago

I have already fixed the issue you mentioned, I believe you're running an older copy of the colab, please go to the original and make a new copy if necessary (or ideally just use the original): https://colab.research.google.com/github/hollowstrawberry/kohya-colab/blob/main/Lora_Trainer.ipynb

gado01 commented 5 months ago

hollowstrawberry, I tried the link you mention, but the error persists. Is there another updated link? Or how to solve that error? Thank you

hollowstrawberry commented 4 months ago

Cannot reproduce this error as of yesterday.