sglvladi / TensorFlowObjectDetectionTutorial

A tutorial on object detection using TensorFlow
194 stars 128 forks source link

CUDNN Failed (but sometimes, rarely not) #65

Closed nvbogu closed 3 years ago

nvbogu commented 3 years ago

For all the guys who had trouble running a model and there was almost always message indicating that CUDNN and sth. similar failed after the loading of cudnn/cuda libaries I might have a fix.

Go add

physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0], True)

to your model_main_tf2.py file inside the main function.

There is a nice explanation from here https://github.com/tensorflow/tensorflow/issues/6698 from strickon commented on 26 Apr 2017 indicating that this has nothing to do with cuda or cudnn but with some sort of tensorflow handels memory allocation.

I just took his solution and refactored it to tf 2.3. (see: https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth?hl=de)

I also needed to add

physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0], True)

at the top of my test/evaluation file e.g. test_from_model.ipynb and restart my kernel otherwise these 2 lines will run into an error when you have still the error in memory/in the kernel.

Greetings Niklas

sglvladi commented 3 years ago

@nvbogu just a note: This is happening because you have other processes using up GPU memory.

Still a valid workaround though 👍