late347 commented 3 years ago

Prerequisites

Please answer the following question for yourself before submitting an issue.

[x] I checked to make sure that this issue has not been filed already.

1. The entire URL of the documentation with the issue

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#install-the-object-detection-api

2. Describe the issue

***** From within TensorFlow/models/research/ cp object_detection/packages/tf2/setup.py . python -m pip install .

I was following the install guide exactly from there and initially I had pip installed tensorflow 2.2.0 and initially the correspondingly correct cudnn 7.6.5 and cudatoolkit 10.1 are installed from nvidia and the simple tests done so far indicate it was working with cudnn and cudatoolkit. But after the forced update to tensorflow 2.4.1 I can no longer link the cuda and cudnn correctly. The problem came directly with the update to tensorflow 2.4.1. I require stable version of the tensorflow object detection API and would preferably like to use transfer learning on some official models that come with this model zoo. Its possible I did something wrong but I just followed the official instruction. and I'm getting really frustrated.

The attached text file shows the terminal output from the pip install from the "python -m pip install ." I did that command from anaconda env prompt.

(tf_new) C:\Users\lauri\Documents\tf_new\Tensorflow\models\research>python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))" 2021-04-01 00:43:57.993189: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2021-04-01 00:43:57.993325: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-04-01 00:43:59.972466: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-04-01 00:44:00.004275: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2021-04-01 00:44:00.213029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5

tensorflow_badinstall_terminal_output.txt

luojueling commented 3 years ago

figure it out?

tocaloshi commented 3 years ago

We are also having the same problem - any ideas.

late347 commented 3 years ago

I will try more stuff out tomorrow, so I cannot promise it will work, but im cautioiusly optimistic. But i will try more model training and hopefulyl just get gpu based training working tomorrow.

I just let tensorflow update to 2.4.1 I had another unrelated problem to wrongly installed pycocotools 2.0. So I just pip uninstalled it and pip installed pycocotools newest version. Then I'm running now on tensorflow 2.4.1. I just installed cudatoolkit 11.0 and cudnn 8.0.4 from nvidia those have to be exact versions for the new tensorflow. The simple tests like the "import tensorflow as tf..." seem to indicate that cuda is being found. But I will have to try more GPU based training tomorrow from a checkpoint to see if my model still is able to train properly with GPU support now. Earlier I was able to do it with cpu only in this same environment. gpu is rtx2070s and windows

late347 commented 3 years ago

well in tensorflow 2.4.1 i had cudatoolkit 11 and cudnn 8.0.4 and same training file that worked with cpu training didnt work on the gpu training now. the tensorflow did attempt to start the training and found bunch of the cuda libraries but the job exited just before it was supposed to start training giving no other indicators of errors. last thing i wil try is update graphics card drivers. But i other than this im at a loss setting up env

late347 commented 3 years ago

actually I managed to start re-training with gpu I THINK because I moved away my old checkpoints from the model folder and started from a pre-trained model ckpt-0 again that seemed to be the issue. picture of terminal.

it seems I also had wrongly installed pycocotools but I pip uninstalled it and put the visual c++ 2015 buildtools exe into the same folder where I was in anaconda env prompt and then I re-installed correctly with these. note you gotta see the build successful message from pycoco install to see if it did it nicely pip install cython pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

tensorflow gpu started training

late347 commented 3 years ago

however downgrading from tensorflow 2.4.1 into tensorflow 2.2.0 was proved to be not succsessful from me. Im still running tensorflow 2.4.1 as it was updated from the official install instructions

azdobylak commented 3 years ago

I had the same problem. The downgrade is caused by tf-models-official dependency. As the temporary hotfix you might consider explicitly setting tf-models-official==4.2.0 version here.

tensorflow / models

tensorflow object detection API setup.py script installs wrong version of tensorflow 2.4.1 (previously stable tensorflow 2.2.0) #9862

Prerequisites

1. The entire URL of the documentation with the issue

2. Describe the issue