tensorflow / models

Models and examples built with TensorFlow
Other
76.98k stars 45.79k forks source link

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. #8312

Open MarioSegallaMoreira opened 4 years ago

MarioSegallaMoreira commented 4 years ago

Hello! I'm doing the object_detection_tutorial.ipynb and when I run the following command it gets this error: I did the protoc install and added all three pythonpath variables from models;models\slim;models\research.

cuda: 10.1 cudnn: 7.6.5

for image_path in TEST_IMAGE_PATHS:
  show_inference(detection_model, image_path)

UnknownError Traceback (most recent call last)

in 1 for image_path in TEST_IMAGE_PATHS: ----> 2 show_inference(detection_model, image_path) in show_inference(model, image_path) 4 image_np = np.array(Image.open(image_path)) 5 # Actual detection. ----> 6 output_dict = run_inference_for_single_image(model, image_np) 7 # Visualization of the results of a detection. 8 vis_util.visualize_boxes_and_labels_on_image_array( in run_inference_for_single_image(model, image) 7 8 # Run inference ----> 9 output_dict = model(input_tensor) 10 11 # All outputs are batches tensors. ~\Anaconda3\lib\site-packages\tensorflow_core\python\eager\function.py in __call__(self, *args, **kwargs) 1549 TypeError: For invalid positional/keyword argument combinations. 1550 """ -> 1551 return self._call_impl(args, kwargs) 1552 1553 def _call_impl(self, args, kwargs, cancellation_manager=None): ~\Anaconda3\lib\site-packages\tensorflow_core\python\eager\function.py in _call_impl(self, args, kwargs, cancellation_manager) 1589 raise TypeError("Keyword arguments {} unknown. Expected {}.".format( 1590 list(kwargs.keys()), list(self._arg_keywords))) -> 1591 return self._call_flat(args, self.captured_inputs, cancellation_manager) 1592 1593 def _filtered_call(self, args, kwargs): ~\Anaconda3\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1690 # No tape is watching; skip to running the function. 1691 return self._build_call_outputs(self._inference_function.call( -> 1692 ctx, args, cancellation_manager=cancellation_manager)) 1693 forward_backward = self._select_forward_and_backward_functions( 1694 args, ~\Anaconda3\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager) 543 inputs=args, 544 attrs=("executor_type", executor_type, "config_proto", config), --> 545 ctx=ctx) 546 else: 547 outputs = execute.execute_with_cancellation( ~\Anaconda3\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 65 else: 66 message = e.message ---> 67 six.raise_from(core._status_to_exception(e.code, message), None) 68 except TypeError as e: 69 keras_symbolic_tensors = [ ~\Anaconda3\lib\site-packages\six.py in raise_from(value, from_value) UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1 (defined at :11) ]] [[Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_3/range/_52]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1 (defined at :11) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_pruned_16578] Function call stack: pruned -> pruned
divyanshusharma1709 commented 4 years ago

Hi! Did you solve the issue yet? If not, please check if your GPU is available using tf.test.is_gpu_available(). If it returns false, then:

  1. Check if some other process is using your GPU
  2. Check your BIOS or settings to see if you have somehow switched to integrated graphics. Let us know if you solved the issue
BenoCharlo commented 4 years ago

Hi! Did you solve the issue yet? If not, please check if your GPU is available using tf.test.is_gpu_available(). If it returns false, then:

  1. Check if some other process is using your GPU
  2. Check your BIOS or settings to see if you have somehow switched to integrated graphics. Let us know if you solved the issue

hello @divyanshusharma1709 , I'm facing the same isue. I'm working with AWS ec2; I have checked if gpu is available, and it does.

What might be the problem?

pryadchenko commented 4 years ago

I've the same issue using tensorflow/tensorflow:1.15.2-gpu-py3-jupyter docker image.

TheEternalToast commented 4 years ago

I've the same issue using tensorflow/tensorflow:1.15.2-gpu-py3-jupyter docker image.

Not sure if this should be a new issue, but I found a possible reason for this: On https://www.tensorflow.org/install/gpu?hl=nb it is reported that tensorflow 1.15 requires CUDA 10.1. But I've looked through the Dockerfile for tensorflow/tensorflow:1.15.2-gpu-py3-jupyter on https://hub.docker.com/layers/tensorflow/tensorflow/1.15.2-gpu-py3-jupyter/images/sha256-2c2ddc9780724ee528757f44beb16dac302a09ee7eb4e333b7dd85404597fdd9?context=explore and it seems that it installs CUDA 10.0 and not CUDA 10.1

If I'm mistaken, can someone please explain to me how? Because I have the same issue and I have yet to try if uninstalling CUDA 10.0 and installing 10.1 will fix my image. If I'm not, it would be great, if someone who has the authority to do so would replace 10.0 with 10.1 in the Dockerfile