openvinotoolkit / openvino_tensorflow

OpenVINO™ integration with TensorFlow
Other
178 stars 54 forks source link

Object Detection Example SegFault on 12th Gen Core Processor with GPU Backend #342

Open alexlamfromhome opened 2 years ago

alexlamfromhome commented 2 years ago

I was able to complete the example on 12th Gen Core Processor using CPU backend. But when I change to GPU backend, the example SegFault.

I have verified that OpenVINO 2022.1 was correctly installed by running the OpenVINO benchmark_app against the GPU device.

2022-07-21 15:47:12.926336: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 15:47:12.929013: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/extras/opencv/lib:/opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-07-21 15:47:12.929026: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-07-21 15:47:14.114683: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/opt/intel/openvino_2022.1.0.643/extras/opencv/python/cv2/../../bin:/opt/intel/openvino_2022/extras/opencv/lib:/opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-07-21 15:47:14.114703: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-07-21 15:47:14.114715: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (my-adl-s): /proc/driver/nvidia/version does not exist
2022-07-21 15:47:14.114839: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Available Backends:
CPU
GPU
2022-07-21 15:47:20.447067: OVTF Summary -> 59 out of 1326 nodes in the graph (4%) are now running with OpenVINO™ backend
2022-07-21 15:47:21.487357: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
Inference time in ms: 205.22
person 0.98
tie 0.81
Output image is saved in detections.jpg
2022-07-21 15:47:23.855600: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 15:47:23.858267: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/extras/opencv/lib:/opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-07-21 15:47:23.858281: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-07-21 15:47:25.036728: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/opt/intel/openvino_2022.1.0.643/extras/opencv/python/cv2/../../bin:/opt/intel/openvino_2022/extras/opencv/lib:/opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-07-21 15:47:25.036750: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-07-21 15:47:25.036761: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (my-adl-s): /proc/driver/nvidia/version does not exist
2022-07-21 15:47:25.036889: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Available Backends:
CPU
GPU
2022-07-21 15:47:31.407075: OVTF Summary -> 59 out of 1326 nodes in the graph (4%) are now running with OpenVINO™ backend
Segmentation fault (core dumped)
alexlamfromhome commented 2 years ago

I have upgraded to OpenVINO 2022.2, but the problem still exists.

(ovtf) alexlam@vct-adl-p:~/projects/openvino_tensorflow$ python3 examples/classification_sample.py --no_show --backend GPU
2022-11-23 15:57:37.335409: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-23 15:57:37.338252: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib:/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-11-23 15:57:37.338266: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-23 15:57:38.395115: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/alexlam/projects/openvino_tensorflow/venv/lib/python3.8/site-packages/cv2/../../lib64:/opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib:/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64
2022-11-23 15:57:38.395136: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-23 15:57:38.395147: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vct-adl-p): /proc/driver/nvidia/version does not exist
2022-11-23 15:57:38.395293: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Available Backends:
CPU
GPU
VAD-M
2022-11-23 15:57:39.733766: OVTF Summary -> 138 out of 899 nodes in the graph (15%) are now running with OpenVINO™ backend
Segmentation fault (core dumped)

But when I test the same on 11th Gen, everything works

(ovtf) alexlam@vct-tgl2:~/projects/openvino_tensorflow$ python3 examples/classification_sample.py --no_show --backend GPU
2022-11-23 16:04:12.426052: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-23 16:04:12.428993: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-23 16:04:12.429008: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-23 16:04:13.590567: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/alexlam/projects/openvino_tensorflow/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-11-23 16:04:13.590591: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-23 16:04:13.590605: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (alexlam-vss-tgl2): /proc/driver/nvidia/version does not exist
2022-11-23 16:04:13.590963: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Available Backends:
CPU
GPU
2022-11-23 16:04:15.407457: OVTF Summary -> 138 out of 899 nodes in the graph (15%) are now running with OpenVINO™ backend
Inference time in ms: 20.30
military uniform 0.79601717
mortarboard 0.020910148
academic gown 0.014557097
suit 0.009166191
comic book 0.007978324

The segfault added this in my dmesg

[2005709.437153] python3[1646147]: segfault at 2f2a30303b ip 00007f52351a7d8b sp 00007f50ab7fb800 error 4 in libc-2.31.so[7f5235132000+178000]
[2005709.437173] Code: 0f b7 02 83 c0 01 66 89 02 0f b7 c0 48 3b 05 6c 45 15 00 0f 83 86 00 00 00 48 8b 56 10 48 85 d2 74 7d 64 8b 04 25 18 00 00 00 <4c> 8b 52 10 85 c0 74 9d eb 4a 0f 1f 00 48 8b 50 10 49 89 c0 4c 89