hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
599 stars 86 forks source link

Tagger not working in Dataset Maker #216

Open theWitchR opened 1 week ago

theWitchR commented 1 week ago
Launching program...

env: PYTHONPATH=/content/kohya-trainer
2024-09-24 19:52:48.243038: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-24 19:52:48.243093: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-24 19:52:48.244448: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
matcordero commented 1 week ago

Same problem, it happens when activating the cell to tag the images, it seems to download the dependencies but when executing it stops without doing anything

theWitchR commented 6 days ago

@hollowstrawberry any idea, what we can do?

Miuna88 commented 6 days ago

Yup, seems to be a issue upon trying to tag images. I kind of wish there was a GUI offline tool which worked the same really, you could feed it a hundred images and tag them (and refine the tags/activation tags etc) as it seems slight changes on google and problems appear.

Ahem, but yeah, same issue.

Psyga315 commented 6 days ago

Same issue as well.

ErichEisner commented 6 days ago

Same. Curating my images also gives an error message before I try tagging them. Might be a related bug or am I the only one experiencing it?

gwhitez commented 6 days ago

I think the error came from the commit I used, I tried to use a new commit (dd9763be31805f24255ca722f30bc5f6d99c73f5) but I got a new error.

image

gwhitez commented 6 days ago

I made a mix of another notebook which works with the tagger but does not have the options for custom tags, etc... I hope it works as a temporary solution https://colab.research.google.com/drive/1OHZVoMPuq_rREWN1saLW37TPu9VUnpeI?usp=sharing

Psyga315 commented 5 days ago

I made a mix of another notebook which works with the tagger but does not have the options for custom tags, etc... I hope it works as a temporary solution https://colab.research.google.com/drive/1OHZVoMPuq_rREWN1saLW37TPu9VUnpeI?usp=sharing

Got this error:

loading onnx model: /content/tagger_models/wd-swinv2-tagger-v3/model.onnx EP Error EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32702 ; hostname=626a7cef0f3a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);

when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.


Traceback (most recent call last): File "/content/trainer/sd_scripts/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/content/trainer/sd_scripts/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32702 ; hostname=626a7cef0f3a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/content/trainer/sd_scripts/finetune/tag_images_by_wd14_tagger.py", line 350, in main(args) File "/content/trainer/sd_scripts/finetune/tag_images_by_wd14_tagger.py", line 102, in main ort_sess = ort.InferenceSession( File "/content/trainer/sd_scripts/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/content/trainer/sd_scripts/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self._create_inference_session(self._fallback_providers, None) File "/content/trainer/sd_scripts/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32702 ; hostname=626a7cef0f3a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);

Tagging complete!

gwhitez commented 5 days ago

use the swinv2 tagger v2

byungtaekyu commented 3 days ago

env: PYTHONPATH=/content/kohya-trainer 2024-09-28 05:07:14.171514: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-28 05:07:14.192775: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-28 05:07:14.199212: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-28 05:07:14.213501: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-28 05:07:15.798236: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

hollowstrawberry commented 2 days ago

The dependency installation causes no errors, causing it to be erased, but in fact pip complains about broken dependencies:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flax 0.8.5 requires jax>=0.4.27, but you have jax 0.4.23 which is incompatible.
optax 0.2.3 requires jax>=0.4.27, but you have jax 0.4.23 which is incompatible.
optax 0.2.3 requires jaxlib>=0.4.27, but you have jaxlib 0.4.23 which is incompatible.
orbax-checkpoint 0.6.4 requires jax>=0.4.26, but you have jax 0.4.23 which is incompatible.
tensorstore 0.1.65 requires ml-dtypes>=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible.
tf-keras 2.17.0 requires tensorflow<2.18,>=2.17, but you have tensorflow 2.15.0 which is incompatible.

This is similar to the trainer error which I tried to fix, and in that case fixing the dependencies didn't work. Not sure how to proceed from here.