nicknochnack / TFODCourse

959 stars 926 forks source link

cuDNN failed to initialize on Colab #29

Open fassuni opened 2 years ago

fassuni commented 2 years ago

2021-11-10 11:34:35.330749: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2021-11-10 11:34:35.333269: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node ssd_mobile_net_v2fpn_keras_feature_extractor/model/Conv1/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/object_detection/models/ssd_mobilenet_v2_fpn_keras_feature_extractor.py:219) ]] [Op:inferencedummy_computation_fn_15068]

How to Fix this?

nguaki commented 2 years ago

Got the same problem. From his video, everything ran so smooth from Colab. If the file is the same, why this is a problem?

nguaki commented 2 years ago

Looks like this code will not work in Colab. It may have worked in January 2021 but the CUDNN and CUDA libraries have changed since.

Shivam7Sharma commented 2 years ago

Looks like this code will not work in Colab. It may have worked in January 2021 but the CUDNN and CUDA libraries have changed since. okay, thanks! I got the same error. Do you know the estimated time it will take to train on 15 images using CPU?

nguaki commented 2 years ago

I trained using TPU. It worked with TPU.

Read it somewhere that GPU is 10X faster. Unfortunately, this code does not work with current Colab environment.

didn't time TPU. But it is definitely less than 1 hour. have not tested CPU but assumption is that TPU is faster than CPU. May be wrong.

I think the Colab setup is what we need to play around with. It may have worked when Nick ran it. But GPU changes all the time and its underlying cuDNN and other CUDA libraries are already installed.

I am thinking about remove setup.py altogether and run it without setup.py

fassuni commented 2 years ago

I stopped using Colab but from what I've seen the problem can be solved using tensorflow 2.6 because the newer one uses Kudnn 8.3...

nguaki commented 2 years ago

setup,py file that gets generated uses 2.6 i believe. these versions are very sensitive and it is hard to debug. For training images using GPU in Colab, I simply given up. Google support was not much of help. Training in TPU worked. I have to cross my finger to find a tutorial where I can train images via Colab/GPU. Nick's tutorial was fantastic except for Colab training part. Everything else was amazing learning experience.

gustavsma commented 2 years ago

You could try using an older version of tfod.

` !git clone https://github.com/tensorflow/models.git {paths['APIMODEL_PATH']} %cd /content/Tensorflow/models/ !git checkout -f e04dafd04d69053d3733bb91d47d0d95bc2c8199

%cd /content !apt-get install protobuf-compiler !cd Tensorflow/models/research && protoc object_detection/protos/*.proto --python_out=. && cp object_detection/packages/tf2/setup.py . && python -m pip install . `