kohya-ss / sd-scripts

Apache License 2.0
5.36k stars 884 forks source link

I get an error when trying to learn using google colab #854

Closed StupidGame closed 1 year ago

StupidGame commented 1 year ago

I get an error when trying to learn using google colab. Error Description:


  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1086, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/image_processing_clip.py", line 22, in <module>
    from ...image_transforms import (
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 48, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/__init__.py", line 38, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/context.py", line 29, in <module>
    from tensorflow.core.framework import function_pb2
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/core/framework/function_pb2.py", line 5, in <module>
    from google.protobuf.internal import builder as _builder
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/usr/local/lib/python3.10/dist-packages/google/protobuf/internal/__init__.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py", line 684, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 20, in <module>
    from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1077, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1076, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1088, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback):
cannot import name 'builder' from 'google.protobuf.internal' (/usr/local/lib/python3.10/dist-packages/google/protobuf/internal/__init__.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/content/sd-scripts/train_network.py", line 27, in <module>
    from library import model_util
  File "/content/sd-scripts/library/model_util.py", line 16, in <module>
    from diffusers import AutoencoderKL, DDIMScheduler, StableDiffusionPipeline  # , UNet2DConditionModel
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py", line 675, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py", line 675, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py", line 674, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py", line 686, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion because of the following error (look up to see its traceback):
Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback):
cannot import name 'builder' from 'google.protobuf.internal' (/usr/local/lib/python3.10/dist-packages/google/protobuf/internal/__init__.py)
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)```
sunnytomy commented 1 year ago

same here, I didn't have this issue before, it seems to show up recently, maybe due to system update?

DKnight54 commented 1 year ago

looks like it couldn't find the library protobuf when trying to run. Try installing the library by adding the line "!pip install protobuf" somewhere, Ideally before installing the requirements (The line !pip install -r requirements.txt or something similar)

Note: without know which/what colab notebook you are trying to run, it would be harder to test and provide an exact fix.

StupidGame commented 1 year ago

@DKnight54 Setting items, etc. are tentative.


from google.colab import drive
drive.mount('/content/drive')

git_repo_path = "test/test" #@param {type:"string"}
token = "test" #@param {type:"string"}

%env TOKEN=$token
#@title
#残りのパッケージインストール

git_repo_url = "https://$$TOKEN@github.com/" + git_repo_path + ".git"
#@markdown * データセットがあるリポジトリのURL

!git clone $git_repo_url
!git clone https://github.com/kohya-ss/sd-scripts.git
!pip install lycoris_lora
!pip install wandb
!pip install xformers
!pip install protobuf
!pip install lion_pytorch
%cd sd-scripts
!pip install -r requirements.txt
%cd ..

from accelerate.utils import write_basic_config

write_basic_config()

pretrained_model_name_or_path = "StupidGame/AnyLoRA" #@param ["enryu43/anifusion_sd_unet_768", "hakurei/waifu-diffusion", "Nilaier/Waifu-Diffusers","CompVis/stable-diffusion-v1-4", "naclbit/trinart_stable_diffusion_v2,diffusers-115k", "naclbit/trinart_stable_diffusion_v2,diffusers-95k", "naclbit/trinart_stable_diffusion_v2,diffusers-60k"] {allow-input: true}
#@markdown * 学習元のDiffusersモデル、ckptどちらかの保存先を入力してください。

pretrained_model_is_v2 = False #@param {type:"boolean"}
#@markdown * 学習元のモデルがSDv2派生かどうか入力してください、

pretrained_model_resolution = "512x512" #@param ["512x512", "768x768"]
#@markdown * 学習元のモデルの学習サイズを選択してください
#@markdown ----

datasets_path = "/content/kohya-mydatasets/datasets/test.toml" #@param {type:"string"}

prompts_path = "/content/kohya-mydatasets/datasets/test.txt" #@param {type:"string"}

dream_booth_epochs = 20 #@param {type:"integer"}
#@markdown * 学習にかけるステップ数です
#@markdown * **元のDiffusers版やXavierXiao氏のStableDiffusion版とほぼ同じ学習を行うには、ステップ数を倍にしてください。**
#@markdown ----

learning_late = 3e-5 #@param {type:"number"}
#@markdown * 学習率です
#@markdown ----

dream_booth_model_ext = "safetensors" #@param ["pt", "ckpt", "safetensors"]
#@markdown * 保存する形式を指定してください。

dream_booth_new_model = "test"#@param {type:"string"}
#@markdown * 保存するファイル / フォルダーの名前を指定してください。
#@markdown ----
output_dir = "/content/drive/MyDrive/loras" #@param{type:"string"}
#@markdown * 保存するファイル / フォルダーの場所を指定してください。
#@markdown ----

network_dim = 8 #@param {type:"integer"}
#@markdown * 次元数
#@markdown ----

network_alpha = 32 #@param {type:"integer"}
#@markdown * しきい値
#@markdown ----

te_coef = 0.5 #@param {type:"number"}
#@markdown * テキストエンコーダーの学習率の係数
#@markdown ----

unet_coef = 1 #@param {type:"number"}
#@markdown * unetの学習率の係数
#@markdown ----

shutdown = True #@param {type:"boolean"}
#@markdown ----

dream_booth_new_model = dream_booth_new_model + "_" + str(learning_late)

output_dir = output_dir + "/" + dream_booth_new_model

conv_dim = "conv_dim=" + str(network_dim)

conv_alpha = "conv_alpha=" + str(network_alpha)

import os
import glob
import shutil
os.makedirs("output", exist_ok=True)

text_lr = learning_late * te_coef
unet_lr = learning_late * unet_coef

!accelerate launch --num_cpu_threads_per_process 12 sd-scripts/train_network.py \
  --pretrained_model_name_or_path=$pretrained_model_name_or_path \
  --dataset_config=$datasets_path \
  --network_dim=$network_dim \
  --network_alpha=$network_alpha \
  --output_dir=$output_dir \
  --lr_scheduler="cosine_with_restarts" \
  --lr_scheduler_num_cycles=2 \
  --text_encoder_lr=$text_lr \
  --unet_lr=$unet_lr \
  --output_name=$dream_booth_new_model \
  --prior_loss_weight=1.0 \
  --seed=42 \
  --max_train_epochs=$dream_booth_epochs \
  --optimizer_type="Lion"  \
  --optimizer_args "weight_decay=1e-1" "betas=0.9,0.99" \
  --max_grad_norm=1.0 \
  --mixed_precision='fp16'\
  --xformers \
  --gradient_checkpointing \
  --save_precision='fp16' \
  --sample_every_n_epochs=1 \
  --sample_prompts=$prompts_path \
  --save_model_as=$dream_booth_model_ext \
  --cache_latents \
  --bucket_no_upscale \
  --log_with="wandb" \
  --wandb_api_key="test" \
  --network_module=lycoris.kohya \
  --network_args $conv_dim $conv_alpha "algo=loha" \
  --max_token_length=150 \
  --logging_dir=logs \
  --noise_offset=0.05 \
  --scale_weight_norms=1.3 \
  --adaptive_noise_scale=0.05 \
  --clip_skip=2 \
  --min_snr_gamma=5

if shutdown == True:
  from google.colab import runtime
  runtime.unassign()```
ruu2 commented 1 year ago

Recently, sdxl branch is merged to main branch. Perpahs that causes this error.

You can avoid this problem using old version of sd-scripts. Or you can run this command below after installing sd-scripts and its requirements. In my case, error disappeared.

!pip install --upgrade protobuf
!cp /usr/local/lib/python3.10/dist-packages/google/protobuf/internal/builder.py /content/
!pip install protobuf==3.19.6
!cp /content/builder.py /usr/local/lib/python3.10/dist-packages/google/protobuf/internal/

I'm not sure this may cause undesirable side-effect or not.

I refered to this web site. https://stackoverflow.com/questions/71759248/importerror-cannot-import-name-builder-from-google-protobuf-internal

DKnight54 commented 1 year ago

@kohya-ss Oh dear. Requirements.txt is installing protobuf version 3.19.6, but when I check the conflict error messages, it throws that tensorflow needs protobuf==3.20.3.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.13.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.19.6 which is incompatible. tensorflow 2.13.0 requires tensorboard<2.14,>=2.13, but you have tensorboard 2.10.1 which is incompatible. tensorflow-datasets 4.9.3 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible. tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 3.19.6 which is incompatible.

I think it may be a dependency conflict issue as apparently tensorboard 2.10.1 is installed, but tensorflow 2.13.0 requires tensorboard<2.14,>=2.13

@StupidGame, temp workaround until the requirements conflict is fixed is to have this in the install section of your notebook: ' !git clone $git_repo_url !git clone https://github.com/kohya-ss/sd-scripts.git !pip install lycoris_lora !pip install wandb !pip install xformers

!pip install protobuf==3.20.3

!pip install lion_pytorch %cd sd-scripts !pip install -r requirements.txt !pip install tensorboard==2.13 !pip install protobuf==3.20.3 %cd .. '

Put the install tensorboard and protobuf with the correct versions after the line !pip install -r requirements.txt

That at least gets your script to the part where it fails because I didn't bother to put in any sort training data instead of the error you had.

kohya-ss commented 1 year ago

The requirements.txt assumes tensorflow==2.10.1, because the version is the last version which supports GPU on windows environment. So please install tensorflow==2.10.1 if there is no issue.

If it is needed to install another version of TensorFlow, the workaround will work.

I'd like to investigate how we can remove TensorFlow dependency in wd14 tagger.

Isotr0py commented 1 year ago

@kohya-ss It seems that wd14 tagger's repo provides onnx model for inference. Maybe we can replace TensorFlow dependency with onnxruntime in wd14 tagger.

However, all wd14 taggers' onnx model uses inputs with a fixed shape [1, 448, 448, 3], which means batch_size is locked to 1... Re-export the wd14 tagger with dynamic shape is needed if we desire a larger batch size on onnx model...

kohya-ss commented 1 year ago

It seems that wd14 tagger's repo provides onnx model for inference. Maybe we can replace TensorFlow dependency with onnxruntime in wd14 tagger.

Thank you for letting me know. It is nice! And also thank you the PR about this. I will check it sooner :)

StupidGame commented 1 year ago

@DKnight54 Thanks! It works!