nicknochnack / TFODCourse

962 stars 923 forks source link

Verification Script cannot find cusolver64_11.dll or cusparse64.dll (CUDA files) even though they are present #8

Closed wiseOsprey1 closed 3 years ago

wiseOsprey1 commented 3 years ago

Verification script cannot find the following files even though they are intact and located in the same CUDA bin directory as other files which are found successfully.

2021-06-02 14:16:50.299737: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found 2021-06-02 14:17:15.380081: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found

I reinstalled CUDA v11.3 in the hopes that that would fix the problem, but it did not. Same error still occurs.

Other steps work fine, such as: 2021-06-02 14:15:46.449806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:02:00.0 name: NVIDIA GeForce MX250 computeCapability: 6.1 coreClock: 1.582GHz coreCount: 3 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 44.76GiB/s 2021-06-02 14:15:46.471334: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll

but the verification fails as indicated by: 2021-06-02 14:17:47.953824: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cwise_ops_common.h:128 : Resource exhausted: OOM when allocating tensor with shape[1,1,1152,320] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu INFO:tensorflow:time(main.ModelBuilderTF2Test.test_create_ssd_models_from_config): 4.73s I0602 14:17:48.105210 21272 test_util.py:2102] time(main.ModelBuilderTF2Test.test_create_ssd_models_from_config): 4.73s [ FAILED ] ModelBuilderTF2Test.test_create_ssd_models_from_config


Ran 24 tests in 125.189s FAILED (errors=1, skipped=1)

I assume these failures are because the dll files cannot be found.

wiseOsprey1 commented 3 years ago

Is this an incompatibility between the latest version of TF2 and CUDA v11.3? What version of TensorFlow was used in the YouTube implementation? Can you force pip to install that version?

Here is the output of pip list:

(tfod) C:\Users\jonsc\OneDrive\Documents\PYTHON\MachineLearning\TFODCourse-main\tfod>pip list WARNING: Ignoring invalid distribution -ix (c:\users\jonsc\onedrive\documents\python\machinelearning\tfodcourse-main\tfod\lib\site-packages) Package Version Location


absl-py 0.12.0 astunparse 1.6.3 backcall 0.2.0 cachetools 4.2.2 certifi 2021.5.30 chardet 4.0.0 colorama 0.4.4 cycler 0.10.0 decorator 5.0.9 flatbuffers 1.12 gast 0.4.0 google-auth 1.30.1 google-auth-oauthlib 0.4.4 google-pasta 0.2.0 grpcio 1.34.1 h5py 3.1.0 idna 2.10 ipykernel 5.5.5 ipython 7.24.0 ipython-genutils 0.2.0 jedi 0.18.0 jupyter-client 6.1.12 jupyter-core 4.7.1 keras-nightly 2.5.0.dev2021032900 Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 lvis 0.5.3 lxml 4.6.3 Markdown 3.3.4 matplotlib 3.4.2 matplotlib-inline 0.1.2 numpy 1.19.5 oauthlib 3.1.1 object-detection 0.1 opencv-python 4.5.2.52 opt-einsum 3.3.0 pandas 1.2.4 parso 0.8.2 pickleshare 0.7.5 Pillow 8.2.0 pip 21.1.2 prompt-toolkit 3.0.18 protobuf 3.17.1 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycocotools 2.0.2 Pygments 2.9.0 pyparsing 2.4.7 PyQt5 5.15.4 PyQt5-Qt5 5.15.2 PyQt5-sip 12.9.0 python-dateutil 2.8.1 pywin32 301 PyYAML 5.4.1 pyzmq 22.1.0 requests 2.25.1 requests-oauthlib 1.3.0 rsa 4.7.2 scipy 1.6.3 setuptools 49.2.1 six 1.15.0 slim 0.1 c:\users\jonsc\onedrive\documents\python\machinelearning\tfodcourse-main\tensorflow\models\research\slim tensorboard 2.5.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.0 tensorflow 2.5.0 tensorflow-estimator 2.5.0 termcolor 1.1.0 tf-models-official 2.5.0 tf-slim 1.1.0 tornado 6.1 traitlets 5.0.5 typing-extensions 3.7.4.3 urllib3 1.26.5 wcwidth 0.2.5 Werkzeug 2.0.1 wget 3.2 wheel 0.36.2 wrapt 1.12.1

wiseOsprey1 commented 3 years ago

Related issue discussed here: https://github.com/tensorflow/tensorflow/issues/44381

Fix: I applied the following based on that discussion: cd "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin" mklink cusolver64_10.dll cusolver64_11.dll

Results: 2021-06-02 17:38:41.785211: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2021-06-02 17:38:41.844450: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll

INFO:tensorflow:time(main.ModelBuilderTF2Test.test_create_ssd_models_from_config):** 22.16s I0602 17:39:13.074880 1288 test_util.py:2102] time(main.ModelBuilderTF2Test.test_create_ssd_models_from_config): 22.16s [ OK ] ModelBuilderTF2Test.test_create_ssd_models_from_config

Ran 24 tests in 32.413s

OK (skipped=1)