Google Colab - Unable to register cuDNN factory

SkelegonDK commented 9 months ago

Describe the problem Worked just a few hours ago, and now I get this error. Runs the whole script then times out.

Full Console Log Already up-to-date Update succeeded. [System ARGV] ['entry_with_update.py', '--preset', 'realistic', '--share'] Loaded preset: /content/Fooocus/presets/realistic.json Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] Fooocus version: 2.1.864 Error checking version for torchsde: No package metadata was found for torchsde Installing requirements Downloading: "https://huggingface.co/lllyasviel/misc/resolve/main/xlvaeapp.pth" to /content/Fooocus/models/vae_approx/xlvaeapp.pth

100% 209k/209k [00:00<00:00, 21.7MB/s] Downloading: "https://huggingface.co/lllyasviel/misc/resolve/main/vaeapp_sd15.pt" to /content/Fooocus/models/vae_approx/vaeapp_sd15.pth

100% 209k/209k [00:00<00:00, 20.0MB/s] Downloading: "https://huggingface.co/lllyasviel/misc/resolve/main/xl-to-v1_interposer-v3.1.safetensors" to /content/Fooocus/models/vae_approx/xl-to-v1_interposer-v3.1.safetensors

100% 6.25M/6.25M [00:00<00:00, 188MB/s] Downloading: "https://huggingface.co/lllyasviel/misc/resolve/main/fooocus_expansion.bin" to /content/Fooocus/models/prompt_expansion/fooocus_expansion/pytorch_model.bin

100% 335M/335M [00:00<00:00, 377MB/s] Downloading: "https://huggingface.co/lllyasviel/fav_models/resolve/main/fav/realisticStockPhoto_v20.safetensors" to /content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors

100% 6.46G/6.46G [00:19<00:00, 349MB/s] Downloading: "https://huggingface.co/lllyasviel/fav_models/resolve/main/fav/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors" to /content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors

100% 222M/222M [00:00<00:00, 333MB/s] Running on local URL: http://127.0.0.1:7865/ Total VRAM 16151 MB, total RAM 52218 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla V100-SXM2-16GB : native VAE dtype: torch.float32 Using pytorch cross attention 2024-02-06 11:33:04.456769: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-06 11:33:04.456817: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-06 11:33:04.458215: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-06 11:33:05.725750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Running on public URL: https://9439b28b3b831ec8c0.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces) Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: /content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors]. Loaded LoRA [/content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 788 keys at weight 0.25. Loaded LoRA [/content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.87 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 or https://9439b28b3b831ec8c0.gradio.live/

SkelegonDK commented 9 months ago

I tried both my own modified notebook but also the vanilla notebook provided. Both gave the same result.

sudip550 commented 9 months ago

same error......... did someone find solution??

SkelegonDK commented 9 months ago

tried updating the drivers and restarting the Colab OS as per GPTs suggestions. Didn't work though.

`The messages you're encountering indicate there are conflicts and warnings related to the use of CUDA libraries and TensorFlow-TensorRT (TF-TRT) integration in your environment. These issues often arise due to multiple installations or registrations of CUDA-related libraries (cuDNN, cuFFT, cuBLAS) and problems with finding TensorRT. Here's a step-by-step approach to address these issues in a Google Colab environment:

1. Resolve Multiple Registrations

The errors suggest that there are attempts to register CUDA libraries more than once. This can happen if the environment is improperly configured or if there are conflicting versions of libraries. To resolve this, ensure a clean setup of your CUDA environment. In Google Colab, this typically isn't an issue due to its managed environment, but restarting the runtime can help reset the state:

# Restart the runtime (this will clear all your variables)
import os
os.kill(os.getpid(), 9)

2. Ensure Proper TensorFlow and CUDA Compatibility

Ensure that your TensorFlow version is compatible with the CUDA and cuDNN versions provided by Google Colab. You can check the TensorFlow website for compatibility matrices. As of the last update, Google Colab usually comes with a compatible setup. You can verify your versions using:

import tensorflow as tf

print("TensorFlow version:", tf.__version__)
print("CUDA version:")
!nvcc --version
print("cuDNN version:")
!apt list --installed | grep cudnn

3. Install or Update TensorRT

The warning about TensorRT suggests that TensorFlow-TensorRT (TF-TRT) integration can't find the TensorRT library. Installing or updating TensorRT might be necessary. However, managing TensorRT installations in Colab can be tricky due to the pre-configured environment. Check for the availability of TensorRT in Colab and consider using TensorFlow's GPU version that might not rely on TensorRT for your specific needs:

# Check TensorRT installation
!dpkg -l | grep nvinfer

If it's missing or you need a different version, handling this in Colab might not be straightforward due to permissions and compatibility. You might have to work within the limitations of the provided environment or look for workarounds specific to your use case.

4. Handling TensorFlow Warnings and Errors

If the above steps don't resolve the warnings or errors, consider isolating the operations causing them. Sometimes, using specific TensorFlow or Keras functions in a certain way can trigger these issues. Ensure your code is updated to the latest TensorFlow API standards and practices.

5. Gradio Deployment

The initial message about gradio deploy suggests you're trying to deploy a model or application. Make sure all library dependencies, including TensorFlow, are correctly specified in your requirements.txt file for deployment on Hugging Face Spaces. Also, ensure your application is tested and works locally in Colab before deploying.

# Example command to deploy to Gradio, ensure you're in the project directory
# !gradio deploy

Remember, direct manipulation of CUDA libraries and complex installations in Google Colab can be limited. If you're working on a project that requires specific versions or configurations not supported by Colab, consider using a local setup or a cloud service that offers more control over the environment.`

GradusGadi commented 9 months ago

Please help me solve the problem. Everything worked yesterday, today there is such a log. gradio.live in endless download. The operation time is running out. Requirement already satisfied: pygit2==1.12.2 in /usr/local/lib/python3.10/dist-packages (1.12.2) Requirement already satisfied: cffi>=1.9.1 in /usr/local/lib/python3.10/dist-packages (from pygit2==1.12.2) (1.16.0) Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.9.1->pygit2==1.12.2) (2.21) /content fatal: destination path 'Fooocus' already exists and is not an empty directory. /content/Fooocus Already up-to-date Update succeeded. [System ARGV] ['entry_with_update.py', '--share'] Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] Fooocus version: 2.1.864 Running on local URL: http://127.0.0.1:7865/ Running on public URL: https://80d1c885b2147191e9.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces) Total VRAM 15102 MB, total RAM 12979 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla T4 : native VAE dtype: torch.float32 Using pytorch cross attention 2024-02-06 12:01:34.369005: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-06 12:01:34.369059: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-06 12:01:34.376734: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-06 12:01:36.330791: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: /content/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/content/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors]. Loaded LoRA [/content/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/content/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.70 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 or https://80d1c885b2147191e9.gradio.live/

GradusGadi commented 9 months ago

I am an ordinary user without high knowledge of the code, if there is a solution, please write detailed instructions on how to fix it

poor7 commented 9 months ago

Wait for the gradio.live site to start working.

sudip550 commented 9 months ago

Wait for the gradio.live site to start working.

it gives 504 error after some time

jatinpreeet commented 9 months ago

the link that is generated by the code , doesn't work , I waited for 5-6 mins for the link to work but then I received 504 error on the tap where the link was opened , Pls fix this issue

GradusGadi commented 9 months ago

Please unsubscribe as soon as it works

mashb1t commented 9 months ago

see https://github.com/lllyasviel/Fooocus/issues/2186, gradio.live seems to be down in general.

mashb1t commented 9 months ago

Closing this issue, works again!

lllyasviel / Fooocus