Open nickkolok opened 8 months ago
Well, it seems to be a HuggingFace issue though.
merges.txt
is absent
Placing merges.txt
to its place did not help.
Downgrading diffusers
from 0.15.1
, 0.17.1
did not solve the issue. Any help is highly appreciated!
It's a messy hacky job based on Linaqruf's Google Colab for SDXL LoRA training, but modified to install into a venv, and I've not had issues running different versions of SD-Scripts installed with it so far, except for needing to manually specify specific versions of torch and bitsandbytes to prevent breaking compatibility, and slightly longer requirements installation due to needing to install more packages instead of relying on the colab's enviroment
If the issue is related to the install (Which I think might be the root cause), then installing this way may resolve your problem. Just make sure to start the correct script by running in the content
folder. (the install script will change the install dir to kohya-trainer instead of the default sd-scripts)
!source venv/bin/activate;accelerate launch kohya-trainer/train_db.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_config=repo_concept/toml/rv6-allres-noregs.toml \
--output_dir=$OUTPUT_DIR \
--save_model_as=diffusers \
--prior_loss_weight=1.0 \
--max_train_steps=$SAVE_MAX_STEPS \
--save_every_n_steps=$SAVE_INTERVAL \
--learning_rate=14e-7 \
--lr_scheduler="cosine" \
--stop_text_encoder_training=20000 \
--optimizer_type="AdamW8bit" \
--mixed_precision="fp16" \
--xformers \
--gradient_checkpointing \
--sample_every_n_steps=$SAVE_INTERVAL \
--sample_sampler="euler_a" \
--sample_prompts=$repo_path_conc/prompts/prompts-rv6-initial.txt
script in attached txt file. kohya_install_script.txt
@DKnight54 , thank you very much for your quick response! Using venv
looks like a really sane idea!
I'm going to try the suggested solution in a few days, when my colab quota respawns (I've just burnt it all away today and yesterday). As for now, I managed to launch training on sd-scripts v0.6.6
(after multiple attempts to guess the correct versions of the dependencies - that also interferes with CUDA version...)
@DKnight54 Thank you very much! That time, I solved the issue by downgrading to v0.6.6
. But today that version just broke without any reasons (as always happens with StableDiffusion and Google Colab - they always break something somewhere, and we have just endure it). So, I am trying your solution. At least it is doing something, GPU load is changing...
So, it is still doing something... somewhere... somehow... I hope for the best :) No progress, no steps-with-percents, no logs about the dataset.
UPD: It has successfully saved intermediate weights!! Hooray!
... and then, after a copule of minutes, the process terminates. I launch the training by the following command:
As
MODEL_NAME
, I've triedNickKolok/lametta-v2012-fp16-conv
,SG161222/Realistic_Vision_V6.0_B1_noVAE
andadmruul/anything-v3.0
.I'm using a Google Colab. Not a ready notebook prepared by someone, but rather my own :) I just need to access to free GPU.
I install
sd-scripts
by these two commands:The result
``` Cloning into 'sd-scripts'... remote: Enumerating objects: 6262, done. remote: Counting objects: 100% (3063/3063), done. remote: Compressing objects: 100% (485/485), done. remote: Total 6262 (delta 2819), reused 2666 (delta 2577), pack-reused 3199 Receiving objects: 100% (6262/6262), 9.38 MiB | 4.65 MiB/s, done. Resolving deltas: 100% (4455/4455), done. Updating files: 100% (103/103), done. /content/sd-scripts Note: switching to 'v0.8.5'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -cThe result
``` Collecting protobuf==3.20.3 Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 6.5 MB/s eta 0:00:00 Collecting google-auth-oauthlib>=0.7.0 Downloading google_auth_oauthlib-1.2.0-py2.py3-none-any.whl (24 kB) Collecting tensorboard<2.16,>=2.15 Downloading tensorboard-2.15.2-py3-none-any.whl (5.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 21.6 MB/s eta 0:00:00 Requirement already satisfied: google-auth>=2.15.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib>=0.7.0) (2.27.0) Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib>=0.7.0) (1.4.1) Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.4.0) Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.62.1) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (3.6) Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.25.2) Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (2.31.0) Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (67.7.2) Requirement already satisfied: six>1.9 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.16.0) Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard<2.16,>=2.15) Downloading tensorboard_data_server-0.7.2-py3-none-manylinux_2_31_x86_64.whl (6.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 46.9 MB/s eta 0:00:00 Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (3.0.1) Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (5.3.3) Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (0.4.0) Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (4.9) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (2024.2.2) Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib>=0.7.0) (3.2.2) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15) (2.1.5) Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (0.6.0) Installing collected packages: tensorboard-data-server, protobuf, google-auth-oauthlib, tensorboard Attempting uninstall: tensorboard-data-server Found existing installation: tensorboard-data-server 0.6.1 Uninstalling tensorboard-data-server-0.6.1: Successfully uninstalled tensorboard-data-server-0.6.1 Attempting uninstall: protobuf Found existing installation: protobuf 3.19.6 Uninstalling protobuf-3.19.6: Successfully uninstalled protobuf-3.19.6 Attempting uninstall: google-auth-oauthlib Found existing installation: google-auth-oauthlib 0.4.6 Uninstalling google-auth-oauthlib-0.4.6: Successfully uninstalled google-auth-oauthlib-0.4.6 Attempting uninstall: tensorboard Found existing installation: tensorboard 2.10.1 Uninstalling tensorboard-2.10.1: Successfully uninstalled tensorboard-2.10.1 Successfully installed google-auth-oauthlib-1.2.0 protobuf-3.20.3 tensorboard-2.15.2 tensorboard-data-server-0.7.2 WARNING: The following packages were previously imported in this runtime: [google] You must restart the runtime in order to use newly installed versions. ```(it might be a good idea to update the requirements though... just to fix the error)
Restarting the runtime and using
python
instead ofaccelerate launch
do not help, as well as switching tov0.8.3
andv0.8.4
instead ofv0.8.5
. I do not use the earlier versions because of the bug #966 . Yesterday I had the same problem multiple times, so it does not seem to be a HuggingFace issue, as suggested in https://github.com/bmaltais/kohya_ss/issues/548#issuecomment-1499527251 However, deletion of the model folder does not help too.Could you please help me?