Failed to start DreamBooth training: `INFO prepare tokenizer`

nickkolok commented 8 months ago

/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--mixed_precision` was set to a value of `'no'`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-02 11:31:44.035836: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-02 11:31:44.035892: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-02 11:31:44.037362: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-02 11:31:45.899812: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-04-02 11:31:47 INFO     prepare tokenizer

... and then, after a copule of minutes, the process terminates. I launch the training by the following command:

accelerate launch sd-scripts/train_db.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_config=repo_concept/toml/rv6-allres-noregs.toml \
    --output_dir=$OUTPUT_DIR  \
    --save_model_as=diffusers \
    --prior_loss_weight=1.0 \
    --max_train_steps=$SAVE_MAX_STEPS \
    --save_every_n_steps=$SAVE_INTERVAL \
    --learning_rate=14e-7 \
    --lr_scheduler="cosine" \
    --stop_text_encoder_training=20000 \
    --optimizer_type="AdamW8bit" \
    --mixed_precision="fp16" \
    --xformers \
    --gradient_checkpointing \
    --sample_every_n_steps=$SAVE_INTERVAL \
    --sample_sampler="euler_a" \
    --sample_prompts=$repo_path_conc/prompts/prompts-rv6-initial.txt

As MODEL_NAME, I've tried NickKolok/lametta-v2012-fp16-conv, SG161222/Realistic_Vision_V6.0_B1_noVAE and admruul/anything-v3.0.

I'm using a Google Colab. Not a ready notebook prepared by someone, but rather my own :) I just need to access to free GPU.

I install sd-scripts by these two commands:

!git clone https://github.com/kohya-ss/sd-scripts.git
%cd sd-scripts
!git checkout v0.8.5
!pip install --use-pep517 --upgrade -r requirements.txt
%cd ..

The result

``` Cloning into 'sd-scripts'... remote: Enumerating objects: 6262, done. remote: Counting objects: 100% (3063/3063), done. remote: Compressing objects: 100% (485/485), done. remote: Total 6262 (delta 2819), reused 2666 (delta 2577), pack-reused 3199 Receiving objects: 100% (6262/6262), 9.38 MiB | 4.65 MiB/s, done. Resolving deltas: 100% (4455/4455), done. Updating files: 100% (103/103), done. /content/sd-scripts Note: switching to 'v0.8.5'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at 6b1520a Merge pull request #1187 from kohya-ss/fix-timeemb Obtaining file:///content/sd-scripts (from -r requirements.txt (line 35)) Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Collecting accelerate==0.25.0 (from -r requirements.txt (line 1)) Downloading accelerate-0.25.0-py3-none-any.whl (265 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 265.7/265.7 kB 2.6 MB/s eta 0:00:00 Collecting transformers==4.36.2 (from -r requirements.txt (line 2)) Downloading transformers-4.36.2-py3-none-any.whl (8.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.2/8.2 MB 29.9 MB/s eta 0:00:00 Collecting diffusers[torch]==0.25.0 (from -r requirements.txt (line 3)) Downloading diffusers-0.25.0-py3-none-any.whl (1.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 50.9 MB/s eta 0:00:00 Collecting ftfy==6.1.1 (from -r requirements.txt (line 4)) Downloading ftfy-6.1.1-py3-none-any.whl (53 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 7.6 MB/s eta 0:00:00 Collecting opencv-python==4.7.0.68 (from -r requirements.txt (line 6)) Downloading opencv_python-4.7.0.68-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.8/61.8 MB 9.7 MB/s eta 0:00:00 Collecting einops==0.7.0 (from -r requirements.txt (line 7)) Downloading einops-0.7.0-py3-none-any.whl (44 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 kB 5.7 MB/s eta 0:00:00 Collecting pytorch-lightning==1.9.0 (from -r requirements.txt (line 8)) Downloading pytorch_lightning-1.9.0-py3-none-any.whl (825 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 825.8/825.8 kB 67.1 MB/s eta 0:00:00 Collecting tensorboard==2.10.1 (from -r requirements.txt (line 10)) Downloading tensorboard-2.10.1-py3-none-any.whl (5.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 101.7 MB/s eta 0:00:00 Requirement already satisfied: safetensors==0.4.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (0.4.2) Requirement already satisfied: altair==4.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (4.2.2) Collecting easygui==0.98.3 (from -r requirements.txt (line 14)) Downloading easygui-0.98.3-py2.py3-none-any.whl (92 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.7/92.7 kB 12.0 MB/s eta 0:00:00 Requirement already satisfied: toml==0.10.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (0.10.2) Collecting voluptuous==0.13.1 (from -r requirements.txt (line 16)) Downloading voluptuous-0.13.1-py3-none-any.whl (29 kB) Collecting huggingface-hub==0.20.1 (from -r requirements.txt (line 17)) Downloading huggingface_hub-0.20.1-py3-none-any.whl (330 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 330.1/330.1 kB 37.5 MB/s eta 0:00:00 Collecting open-clip-torch==2.20.0 (from -r requirements.txt (line 31)) Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 81.1 MB/s eta 0:00:00 Collecting rich==13.7.0 (from -r requirements.txt (line 33)) Downloading rich-13.7.0-py3-none-any.whl (240 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.6/240.6 kB 27.1 MB/s eta 0:00:00 Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.25.0->-r requirements.txt (line 1)) (1.25.2) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.25.0->-r requirements.txt (line 1)) (24.0) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.25.0->-r requirements.txt (line 1)) (5.9.5) Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.25.0->-r requirements.txt (line 1)) (6.0.1) Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.25.0->-r requirements.txt (line 1)) (2.2.1+cu121) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.36.2->-r requirements.txt (line 2)) (3.13.3) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.36.2->-r requirements.txt (line 2)) (2023.12.25) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.36.2->-r requirements.txt (line 2)) (2.31.0) Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers==4.36.2->-r requirements.txt (line 2)) (0.15.2) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.36.2->-r requirements.txt (line 2)) (4.66.2) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from diffusers[torch]==0.25.0->-r requirements.txt (line 3)) (7.1.0) Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from diffusers[torch]==0.25.0->-r requirements.txt (line 3)) (9.4.0) Requirement already satisfied: wcwidth>=0.2.5 in /usr/local/lib/python3.10/dist-packages (from ftfy==6.1.1->-r requirements.txt (line 4)) (0.2.13) Requirement already satisfied: fsspec[http]>2021.06.0 in /usr/local/lib/python3.10/dist-packages (from pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (2023.6.0) Collecting torchmetrics>=0.7.0 (from pytorch-lightning==1.9.0->-r requirements.txt (line 8)) Downloading torchmetrics-1.3.2-py3-none-any.whl (841 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 841.5/841.5 kB 66.8 MB/s eta 0:00:00 Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (4.10.0) Collecting lightning-utilities>=0.4.2 (from pytorch-lightning==1.9.0->-r requirements.txt (line 8)) Downloading lightning_utilities-0.11.2-py3-none-any.whl (26 kB) Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (1.4.0) Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (1.62.1) Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (2.27.0) Collecting google-auth-oauthlib<0.5,>=0.4.1 (from tensorboard==2.10.1->-r requirements.txt (line 10)) Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (3.6) Collecting protobuf<3.20,>=3.9.2 (from tensorboard==2.10.1->-r requirements.txt (line 10)) Downloading protobuf-3.19.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 71.2 MB/s eta 0:00:00 Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (67.7.2) Collecting tensorboard-data-server<0.7.0,>=0.6.0 (from tensorboard==2.10.1->-r requirements.txt (line 10)) Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 109.9 MB/s eta 0:00:00 Collecting tensorboard-plugin-wit>=1.6.0 (from tensorboard==2.10.1->-r requirements.txt (line 10)) Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 64.5 MB/s eta 0:00:00 Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (3.0.1) Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.10/dist-packages (from tensorboard==2.10.1->-r requirements.txt (line 10)) (0.43.0) Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair==4.2.2->-r requirements.txt (line 13)) (0.4) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from altair==4.2.2->-r requirements.txt (line 13)) (3.1.3) Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair==4.2.2->-r requirements.txt (line 13)) (4.19.2) Requirement already satisfied: pandas>=0.18 in /usr/local/lib/python3.10/dist-packages (from altair==4.2.2->-r requirements.txt (line 13)) (1.5.3) Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair==4.2.2->-r requirements.txt (line 13)) (0.12.1) Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from open-clip-torch==2.20.0->-r requirements.txt (line 31)) (0.17.1+cu121) Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from open-clip-torch==2.20.0->-r requirements.txt (line 31)) (0.1.99) Collecting timm (from open-clip-torch==2.20.0->-r requirements.txt (line 31)) Downloading timm-0.9.16-py3-none-any.whl (2.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 80.8 MB/s eta 0:00:00 Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich==13.7.0->-r requirements.txt (line 33)) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich==13.7.0->-r requirements.txt (line 33)) (2.16.1) Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.10/dist-packages (from fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (3.9.3) Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard==2.10.1->-r requirements.txt (line 10)) (5.3.3) Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard==2.10.1->-r requirements.txt (line 10)) (0.4.0) Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard==2.10.1->-r requirements.txt (line 10)) (4.9) Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.10.1->-r requirements.txt (line 10)) (1.4.1) Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair==4.2.2->-r requirements.txt (line 13)) (23.2.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair==4.2.2->-r requirements.txt (line 13)) (2023.12.1) Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair==4.2.2->-r requirements.txt (line 13)) (0.34.0) Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair==4.2.2->-r requirements.txt (line 13)) (0.18.0) Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich==13.7.0->-r requirements.txt (line 33)) (0.1.2) Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.18->altair==4.2.2->-r requirements.txt (line 13)) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.18->altair==4.2.2->-r requirements.txt (line 13)) (2023.4) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.36.2->-r requirements.txt (line 2)) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.36.2->-r requirements.txt (line 2)) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.36.2->-r requirements.txt (line 2)) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.36.2->-r requirements.txt (line 2)) (2024.2.2) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) (1.12) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) (3.2.1) Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 65.9 MB/s eta 0:00:00 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 69.4 MB/s eta 0:00:00 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 62.7 MB/s eta 0:00:00 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 2.3 MB/s eta 0:00:00 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 1.9 MB/s eta 0:00:00 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 8.3 MB/s eta 0:00:00 Collecting nvidia-curand-cu12==10.3.2.106 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 15.2 MB/s eta 0:00:00 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 8.4 MB/s eta 0:00:00 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 6.3 MB/s eta 0:00:00 Collecting nvidia-nccl-cu12==2.19.3 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 6.0 MB/s eta 0:00:00 Collecting nvidia-nvtx-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 13.8 MB/s eta 0:00:00 Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) (2.2.0) Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) Downloading nvidia_nvjitlink_cu12-12.4.99-py3-none-manylinux2014_x86_64.whl (21.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 80.6 MB/s eta 0:00:00 Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from werkzeug>=1.0.1->tensorboard==2.10.1->-r requirements.txt (line 10)) (2.1.5) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->diffusers[torch]==0.25.0->-r requirements.txt (line 3)) (3.18.1) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (1.3.1) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (1.9.4) Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch-lightning==1.9.0->-r requirements.txt (line 8)) (4.0.3) Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard==2.10.1->-r requirements.txt (line 10)) (0.6.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas>=0.18->altair==4.2.2->-r requirements.txt (line 13)) (1.16.0) Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.10.1->-r requirements.txt (line 10)) (3.2.2) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->accelerate==0.25.0->-r requirements.txt (line 1)) (1.3.0) Building wheels for collected packages: library Building editable for library (pyproject.toml) ... done Created wheel for library: filename=library-0.0.0-0.editable-py3-none-any.whl size=6603 sha256=448d33b9bb012b9283a6752a6cc4159fc01942e00b942e316fa7f209c0bca63a Stored in directory: /tmp/pip-ephem-wheel-cache-hg9u044h/wheels/aa/ec/1c/603d2be168c98844a99288d554a1e0bbd6073e086d79578e30 Successfully built library Installing collected packages: voluptuous, tensorboard-plugin-wit, library, easygui, tensorboard-data-server, protobuf, opencv-python, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, lightning-utilities, ftfy, einops, rich, nvidia-cusparse-cu12, nvidia-cudnn-cu12, huggingface-hub, nvidia-cusolver-cu12, google-auth-oauthlib, diffusers, transformers, tensorboard, torchmetrics, accelerate, timm, pytorch-lightning, open-clip-torch Attempting uninstall: tensorboard-data-server Found existing installation: tensorboard-data-server 0.7.2 Uninstalling tensorboard-data-server-0.7.2: Successfully uninstalled tensorboard-data-server-0.7.2 Attempting uninstall: protobuf Found existing installation: protobuf 3.20.3 Uninstalling protobuf-3.20.3: Successfully uninstalled protobuf-3.20.3 Attempting uninstall: opencv-python Found existing installation: opencv-python 4.8.0.76 Uninstalling opencv-python-4.8.0.76: Successfully uninstalled opencv-python-4.8.0.76 Attempting uninstall: rich Found existing installation: rich 13.7.1 Uninstalling rich-13.7.1: Successfully uninstalled rich-13.7.1 Attempting uninstall: huggingface-hub Found existing installation: huggingface-hub 0.20.3 Uninstalling huggingface-hub-0.20.3: Successfully uninstalled huggingface-hub-0.20.3 Attempting uninstall: google-auth-oauthlib Found existing installation: google-auth-oauthlib 1.2.0 Uninstalling google-auth-oauthlib-1.2.0: Successfully uninstalled google-auth-oauthlib-1.2.0 Attempting uninstall: transformers Found existing installation: transformers 4.38.2 Uninstalling transformers-4.38.2: Successfully uninstalled transformers-4.38.2 Attempting uninstall: tensorboard Found existing installation: tensorboard 2.15.2 Uninstalling tensorboard-2.15.2: Successfully uninstalled tensorboard-2.15.2 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pandas-gbq 0.19.2 requires google-auth-oauthlib>=0.7.0, but you have google-auth-oauthlib 0.4.6 which is incompatible. tensorflow 2.15.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.19.6 which is incompatible. tensorflow 2.15.0 requires tensorboard<2.16,>=2.15, but you have tensorboard 2.10.1 which is incompatible. tensorflow-datasets 4.9.4 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible. tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 3.19.6 which is incompatible. Successfully installed accelerate-0.25.0 diffusers-0.25.0 easygui-0.98.3 einops-0.7.0 ftfy-6.1.1 google-auth-oauthlib-0.4.6 huggingface-hub-0.20.1 library-0.0.0 lightning-utilities-0.11.2 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.99 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 opencv-python-4.7.0.68 protobuf-3.19.6 pytorch-lightning-1.9.0 rich-13.7.0 tensorboard-2.10.1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 timm-0.9.16 torchmetrics-1.3.2 transformers-4.36.2 voluptuous-0.13.1 WARNING: The following packages were previously imported in this runtime: [google,huggingface_hub] You must restart the runtime in order to use newly installed versions. ```

%pip install 'protobuf==3.20.3' 'google-auth-oauthlib>=0.7.0' 'tensorboard<2.16,>=2.15'

The result

``` Collecting protobuf==3.20.3 Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 6.5 MB/s eta 0:00:00 Collecting google-auth-oauthlib>=0.7.0 Downloading google_auth_oauthlib-1.2.0-py2.py3-none-any.whl (24 kB) Collecting tensorboard<2.16,>=2.15 Downloading tensorboard-2.15.2-py3-none-any.whl (5.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 21.6 MB/s eta 0:00:00 Requirement already satisfied: google-auth>=2.15.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib>=0.7.0) (2.27.0) Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib>=0.7.0) (1.4.1) Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.4.0) Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.62.1) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (3.6) Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.25.2) Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (2.31.0) Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (67.7.2) Requirement already satisfied: six>1.9 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (1.16.0) Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard<2.16,>=2.15) Downloading tensorboard_data_server-0.7.2-py3-none-manylinux_2_31_x86_64.whl (6.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 46.9 MB/s eta 0:00:00 Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.16,>=2.15) (3.0.1) Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (5.3.3) Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (0.4.0) Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (4.9) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15) (2024.2.2) Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib>=0.7.0) (3.2.2) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15) (2.1.5) Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=2.15.0->google-auth-oauthlib>=0.7.0) (0.6.0) Installing collected packages: tensorboard-data-server, protobuf, google-auth-oauthlib, tensorboard Attempting uninstall: tensorboard-data-server Found existing installation: tensorboard-data-server 0.6.1 Uninstalling tensorboard-data-server-0.6.1: Successfully uninstalled tensorboard-data-server-0.6.1 Attempting uninstall: protobuf Found existing installation: protobuf 3.19.6 Uninstalling protobuf-3.19.6: Successfully uninstalled protobuf-3.19.6 Attempting uninstall: google-auth-oauthlib Found existing installation: google-auth-oauthlib 0.4.6 Uninstalling google-auth-oauthlib-0.4.6: Successfully uninstalled google-auth-oauthlib-0.4.6 Attempting uninstall: tensorboard Found existing installation: tensorboard 2.10.1 Uninstalling tensorboard-2.10.1: Successfully uninstalled tensorboard-2.10.1 Successfully installed google-auth-oauthlib-1.2.0 protobuf-3.20.3 tensorboard-2.15.2 tensorboard-data-server-0.7.2 WARNING: The following packages were previously imported in this runtime: [google] You must restart the runtime in order to use newly installed versions. ```

(it might be a good idea to update the requirements though... just to fix the error)

Restarting the runtime and using python instead of accelerate launch do not help, as well as switching to v0.8.3 and v0.8.4 instead of v0.8.5. I do not use the earlier versions because of the bug #966 . Yesterday I had the same problem multiple times, so it does not seem to be a HuggingFace issue, as suggested in https://github.com/bmaltais/kohya_ss/issues/548#issuecomment-1499527251 However, deletion of the model folder does not help too.

Could you please help me?

nickkolok commented 8 months ago

Well, it seems to be a HuggingFace issue though.

merges.txt is absent

nickkolok commented 8 months ago

Placing merges.txt to its place did not help.

nickkolok commented 8 months ago

Downgrading diffusers from 0.15.1, 0.17.1 did not solve the issue. Any help is highly appreciated!

DKnight54 commented 8 months ago

It's a messy hacky job based on Linaqruf's Google Colab for SDXL LoRA training, but modified to install into a venv, and I've not had issues running different versions of SD-Scripts installed with it so far, except for needing to manually specify specific versions of torch and bitsandbytes to prevent breaking compatibility, and slightly longer requirements installation due to needing to install more packages instead of relying on the colab's enviroment

If the issue is related to the install (Which I think might be the root cause), then installing this way may resolve your problem. Just make sure to start the correct script by running in the content folder. (the install script will change the install dir to kohya-trainer instead of the default sd-scripts)

!source venv/bin/activate;accelerate launch kohya-trainer/train_db.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_config=repo_concept/toml/rv6-allres-noregs.toml \
    --output_dir=$OUTPUT_DIR  \
    --save_model_as=diffusers \
    --prior_loss_weight=1.0 \
    --max_train_steps=$SAVE_MAX_STEPS \
    --save_every_n_steps=$SAVE_INTERVAL \
    --learning_rate=14e-7 \
    --lr_scheduler="cosine" \
    --stop_text_encoder_training=20000 \
    --optimizer_type="AdamW8bit" \
    --mixed_precision="fp16" \
    --xformers \
    --gradient_checkpointing \
    --sample_every_n_steps=$SAVE_INTERVAL \
    --sample_sampler="euler_a" \
    --sample_prompts=$repo_path_conc/prompts/prompts-rv6-initial.txt

script in attached txt file. kohya_install_script.txt

nickkolok commented 8 months ago

@DKnight54 , thank you very much for your quick response! Using venv looks like a really sane idea!

I'm going to try the suggested solution in a few days, when my colab quota respawns (I've just burnt it all away today and yesterday). As for now, I managed to launch training on sd-scripts v0.6.6 (after multiple attempts to guess the correct versions of the dependencies - that also interferes with CUDA version...)

nickkolok commented 3 weeks ago

@DKnight54 Thank you very much! That time, I solved the issue by downgrading to v0.6.6. But today that version just broke without any reasons (as always happens with StableDiffusion and Google Colab - they always break something somewhere, and we have just endure it). So, I am trying your solution. At least it is doing something, GPU load is changing...

nickkolok commented 3 weeks ago

So, it is still doing something... somewhere... somehow... I hope for the best :) No progress, no steps-with-percents, no logs about the dataset.

UPD: It has successfully saved intermediate weights!! Hooray!

kohya-ss / sd-scripts

Failed to start DreamBooth training: `INFO prepare tokenizer` #1235