tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.75k forks source link

Error when executing with gpu tutorial object detection. #9410

Open alondra-76 opened 4 years ago

alondra-76 commented 4 years ago

GPU-GeForce GTX 1650 driver nvidia- 450 CUDA-10.1 CUDNN-7.6 Tensoflow-2.2.0 python-3.8.3 Ubuntu-18.04

When I run the object detection tutorial, the gpu is correctly detects me and loads the model. The problem appears when running the inference.

in 1 for image_path in TEST_IMAGE_PATHS: ----> 2 show_inference(detection_model, image_path) in show_inference(model, image_path) 4 image_np = np.array(Image.open(image_path)) 5 # Actual detection. ----> 6 output_dict = run_inference_for_single_image(model, image_np) 7 # Visualization of the results of a detection. 8 vis_util.visualize_boxes_and_labels_on_image_array( in run_inference_for_single_image(model, image) 8 # Run inference 9 model_fn = model.signatures['serving_default'] ---> 10 output_dict = model_fn(input_tensor) 11 12 # All outputs are batches tensors. ~/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs) 1603 TypeError: For invalid positional/keyword argument combinations. 1604 """ -> 1605 return self._call_impl(args, kwargs) 1606 1607 def _call_impl(self, args, kwargs, cancellation_manager=None): ~/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_impl(self, args, kwargs, cancellation_manager) 1643 raise TypeError("Keyword arguments {} unknown. Expected {}.".format( 1644 list(kwargs.keys()), list(self._arg_keywords))) -> 1645 return self._call_flat(args, self.captured_inputs, cancellation_manager) 1646 1647 def _filtered_call(self, args, kwargs): ~/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1743 and executing_eagerly): 1744 # No tape is watching; skip to running the function. -> 1745 return self._build_call_outputs(self._inference_function.call( 1746 ctx, args, cancellation_manager=cancellation_manager)) 1747 forward_backward = self._select_forward_and_backward_functions( ~/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager) 591 with _InterpolateFunctionError(self): 592 if cancellation_manager is None: --> 593 outputs = execute.execute( 594 str(self.signature.name), 595 num_outputs=self._num_outputs, ~/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 57 try: 58 ctx.ensure_initialized() ---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1 (defined at :11) ]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1 (defined at :11) ]] [[Postprocessor/BatchMultiClassNonMaxSuppression/map/while/Identity/_43]] 0 successful operations. 0 derived errors ignored. [Op:__inference_pruned_17182] Function call stack: pruned -> pruned ------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2020-10-25 10:58:27.887776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-10-25 10:58:27.932095: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:27.932441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:06:00.0 name: GeForce GTX 1650 computeCapability: 7.5 coreClock: 1.695GHz coreCount: 14 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s 2020-10-25 10:58:27.934866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-10-25 10:58:27.978859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-10-25 10:58:28.004490: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-10-25 10:58:28.011528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-10-25 10:58:28.057270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-10-25 10:58:28.064023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-10-25 10:58:28.145338: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-10-25 10:58:28.145777: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.146731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.147446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-10-25 10:58:28.148134: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-10-25 10:58:28.160140: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3393235000 Hz 2020-10-25 10:58:28.160818: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fba14000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-25 10:58:28.160864: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-10-25 10:58:28.256111: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.256611: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e5fa6c3a10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-10-25 10:58:28.256626: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5 2020-10-25 10:58:28.257194: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.257478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:06:00.0 name: GeForce GTX 1650 computeCapability: 7.5 coreClock: 1.695GHz coreCount: 14 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s 2020-10-25 10:58:28.257520: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-10-25 10:58:28.257534: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-10-25 10:58:28.257545: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-10-25 10:58:28.257557: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-10-25 10:58:28.257567: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-10-25 10:58:28.257578: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-10-25 10:58:28.257589: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-10-25 10:58:28.257655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.257978: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.258241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-10-25 10:58:28.258853: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-10-25 10:58:28.260062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-25 10:58:28.260074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2020-10-25 10:58:28.260080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2020-10-25 10:58:28.260730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.261065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-25 10:58:28.261355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3357 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650, pci bus id: 0000:06:00.0, compute capability: 7.5) 2020-10-25 10:59:03.502628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-10-25 10:59:04.206827: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-10-25 10:59:04.215675: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
saikumarchalla commented 4 years ago

@alondra-76 Could you please fill the issue template.Also provide the simple standalone code/ colab link to reproduce the issue at our end. What is the top-level directory of the model you are using?. Thanks!

saikumarchalla commented 4 years ago

@alondra-76 Please have a look at this issue. Hope it helps.

alondra-76 commented 4 years ago

I am running the tensorflow zoo sample notebook. https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/object_detection_tutorial.ipynb My directory is cd models/research.

saikumarchalla commented 4 years ago

@alondra-76 I tried to reproduce the issue but didn't face any erorr for mobilenet model. Please find the gist here.Thanks!

alondra-76 commented 4 years ago

I have a 1050ti and a 1560. With the 1050 I have no problem.