Open edwardHujber opened 5 years ago
With Windows 10, TF 1.15/CUDA 10.0/cuDNN 7.6.4.38 I also get this ptx warning followed eventually by a CUDA OOM error in a cross-validation loop (my own code, not model_main.py). Did not occur with TF 1.12.0/CUDA 9.0/cuDNN 7.3.1.20
System information What is the top-level directory of the model you are using: \models\research\object_detection\
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO, trying to use object_detection_tutorial.ipynb
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): installed using pip(pip install tensorflow-gpu)
TensorFlow version (use command below): v2.0.0
Bazel version (if compiling from source): N/A
CUDA/cuDNN version: CUDA Version 10.0.130 cuDNN: 7.6.4
GPU model and memory: GeForce GTX 1050 4 GB dedicated, 3.9 GB shared
Exact command to reproduce: runt the object_detection_tutorial.ipynb file
Describe the problem and it got stuck at the loop where the image results were meant to be shown i.e.:-
for image_path in TEST_IMAGE_PATHS: show_inference(detection_model, image_path)
. It stayed here until the jupyter notebook dispalyed a message saying kernel has died. When tried to run it in anaconda prompt ,the following was displayed at the end after which the no images were shown and the process ended.
W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once.
Please look into this matter.
I got the same error, keras doesn't use gpu
I also got the same error. After the error appears in the console, the kernal is dead and must be restarted. Please provide some information for this issue!
The following line is causing the issue:
output_dict = model(input_tensor)
Same problem here.
Its weird - I get this error and then model.predict
is super slow, but fitting the model is just as fast as normal.
Is this issue due to CUDA 10 ? i'm having this issue as well
I could resolve this problem by using tensorflow version 1.9. Works as expected now!
@Keyrainn Tensorflow 1.14 with CUDA 10.0 works for me
Same problem here on Windows 10 with Keras 2.3.1 and TensorFlow 2.0. Could this somehow be related to this issue?
I'm also having the same issue with TensorFlow 2.0 and Windows 10 while trying to run object_detection_tutorial.ipynb, specifically failing on output_dict = model(input_tensor)
. I'd prefer not to roll back to v1.9 if possible.
same problem any solution ? :) object_detection_tutorial.ipynb doesn´t run
I had the same ptx
hang up occasionally in addition to freezing at basic_session_run_hooks.py step = 0
.
I'm running with TF 1.15 and CUDA 10.
I managed to get things up running again by downgrading my NVIDIA drivers to 431.60.
I fixed it by downgrading tensorflow. Not the best solution but works
I had the same
ptx
hang up occasionally in addition to freezing atbasic_session_run_hooks.py step = 0
. I'm running with TF 1.15 and CUDA 10. I managed to get things up running again by downgrading my NVIDIA drivers to 431.60.
wich cuDNN did you use??
Please solve the issue quickly
I fixed it by downgrading tensorflow. Not the best solution but works
Downgrading till which tensorflow
I fixed it by downgrading tensorflow. Not the best solution but works
Downgrading till which tensorflow
TF 1.15
Installed TF 1.5 but getting error @akoutsoukis
In [7] model_name = 'ssd_mobilenet_v1_coco_2017_11_17' detection_model = load_model(model_name)
TypeError Traceback (most recent call last)
I fixed it by downgrading tensorflow. Not the best solution but works
Downgrading till which tensorflow
TF 1.15
Installed TF 1.5 but getting error @akoutsoukis
In [7] model_name = 'ssd_mobilenet_v1_coco_2017_11_17' detection_model = load_model(model_name)
WARNING:tensorflow:From :11: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. TypeError Traceback (most recent call last) in 1 model_name = 'ssd_mobilenet_v1_coco_2017_11_17' ----> 2 detection_model = load_model(model_name)
in load_model(model_name) 9 model_dir = pathlib.Path(model_dir)/"saved_model" 10 ---> 11 model = tf.saved_model.load(str(model_dir)) 12 model = model.signatures['serving_default'] 13
c:\users\hemant ghuge\anaconda3\envs\tensorflow1bg\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, *kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated',
TypeError: load() missing 2 required positional arguments: 'tags' and 'export_dir'
Getting this same issue. I would really hate to downgrade driver or Tensorflow, especially since, I just upgraded to Tensorflow2.0 and modified my code accordingly. Any solution?
Hey,just wanted to ask one thing. Is this problem just happening for windows users or are linux or mac users too facing the same problem? pls reply.
@Acejoy For me it was in Linux.
I'm having this same problem With Windows 10, installed tensorflow-gpu with conda: TF 2.0.0/CUDA 10.0
Same issue here, hangs on
2020-01-31 18:29:05.919027: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
Then crashes with no error.
Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10
Ok it’s clear that many have experienced the problem and for many, many months. Can we know where the solution resides? In a fixed NVidia driver? In tensorflow? Thanks
Hi team, I just had the exact same problem on the following configuration :
And solve it by reinstalling (... copying to be more accurate) the correct cuDNN files version.
For any reasons I tried first to install the very latest CUDA (10.1), cuDNN (for CUDA 10.1), Tensorflow (2.1) versions and fall back to the versions mentionned at the beginning of the post because of many problems, but I forgot to also downgrade cuDNN.
Now everything works fine.
Hope this helps Dan.
@dmoreyes Hi) so what is your current version of TF, CUDA, and cudNN. I have the same issue as you. gtx1050ti, TF 2.0.0. Cuda 10.2, cudNN 10.2.
Hi, Here are the versions I'm using for my Windows 10 Pro x64 OS
cuDNN: version for CUDA 10.0 (v7.6.5.32) : https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-windows10-x64-v7.6.5.32.zip
TensorFlow: 2.0.0 : Installed using pip install tensorflow-gpu==2.0
NVidia driver version for GeForce RTX 2080 Ti : 432.00
Dan
Hi, Here are the versions I'm using for my Windows 10 Pro x64 OS
- CUDA:10.0 : https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
- cuDNN: version for CUDA 10.0 (v7.6.5.32) : https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-windows10-x64-v7.6.5.32.zip
- TensorFlow: 2.0.0 : Installed using pip install tensorflow-gpu==2.0
- NVidia driver version for GeForce RTX 2080 Ti : 432.00
Dan
Do you still get the error Invoking ptxas not supported on Windows?
Any recommendation or solution for the problem? I am experiencing the same issue. Here is my setup:
Windows 10 CUDA 10.1 TensorFlow 2.0.1 NVidia RTX 2080 Ti
Thanks!
I don't know if this is related, but the same time this error started appearing (I didn't get the freeze issue though), training on a Titan X (pascal) became about 10x slower for a simple two layer network. Tensorflow 1.13.1 worked fine, every TF version after that was slow.
I just updated drivers (to 442.19) and while the ptx error is still there, training has resumed normal speed! This is Windows 10, CUDA 10.0, TensorFlow 1.15.2, Titan X (pascal).
Windows 10 Tensorflow 2.1.0 Cuda 10.1 cuDNN for CUDA 10.1 (v. 7.6.5.32) GeForce RTX 1060
[INFO] training network...
Epoch 1/75
2020-02-15 14:45:46.794388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-15 14:45:47.071668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-15 14:45:47.998708: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
then crashed without any errors. Updated driver to 442.19. The warning remains, but training start working.
Windows 10
Tensorflow 2.0
Cuda 10.0
Cudnn 7.6.5 for cuda 10.0
GeForce GTX 1050 ti
Driver, latest to this date 442.19
I'm still getting this error after having tried many configurations of tensorflow and cuda versions.
I'm starting to think it might be an error in the data pipeline as explained here https://stackoverflow.com/questions/58455765/keras-sees-my-gpu-but-doesnt-use-it-when-training-a-neural-network but I'm not really sure of how to use the tf.records to solve this, here's my code https://github.com/JuanDRC/AlzheimerProj/blob/master/FreezeNone.py
Epoch 1/100 2020-02-17 11:15:18.577785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2020-02-17 11:15:19.784597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-02-17 11:15:21.886392: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 2020-02-17 11:15:22.281623: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.53GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
According to https://www.tensorflow.org/install/source#tested_source_configurations cuDNN for CUDA 10.0 should be 7.4.
Windows 10 Tensorflow 2.1.0 Cuda 10.1 cuDNN 7.6.5 for Cuda 10.1 GeForce RTX 2070 Driver 442.19
Any idea on how to fix this please ? I've also tried Tensorflow 2.0, Cuda 10, cuDNN 7.4 for Cuda 10 And Tensorflow 2.1.0, Cuda 10.2, cuDNN 7.6.5 for Cuda 10.2
2020-02-23 23:32:55.488931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Loading the Tensorflow model into memory
2020-02-23 23:33:02.694777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-23 23:33:02.706990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-23 23:33:02.709086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-23 23:33:02.713757: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-23 23:33:02.717086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-23 23:33:02.718813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-23 23:33:02.722356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-23 23:33:02.724771: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-23 23:33:02.736184: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:02.737495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-23 23:33:02.738601: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-02-23 23:33:02.741882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-23 23:33:02.743964: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-23 23:33:02.745035: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-23 23:33:02.746099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-23 23:33:02.747154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-23 23:33:02.748218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-23 23:33:02.749306: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-23 23:33:02.750383: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:02.751586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-23 23:33:03.104342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-23 23:33:03.105515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-02-23 23:33:03.106198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-02-23 23:33:03.107188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6304 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loading label map
Starting capture
2020-02-23 23:33:15.129453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:15.913596: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-02-23 23:33:15.929984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
I am having the same error but the program runs, Keras 2.3.1 TF 1.15 (GPU version from pip install) CUDA 10.0
I was trying to use the resnet prebuilt model The output comes as expected from variable j
I would like to know if the GPU is utilized by keras as some people above mention that the GPU is not utilized with such error
j = resnet_model.predict(image_batch)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2020-04-06 17:02:12.132870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 17:02:13.473220: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-06 17:02:13.517059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
The beginning pile of logs looks fine
Using TensorFlow backend.
2020-04-06 16:55:19.036335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
PIL image size (480, 640)
numpy array size (640, 480, 3)
image batch size (1, 640, 480, 3)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-04-06 16:55:21.590580: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-06 16:55:21.622826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-04-06 16:55:21.623141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-04-06 16:55:21.628250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-04-06 16:55:21.631091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-04-06 16:55:21.632380: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-04-06 16:55:21.636182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-04-06 16:55:21.639007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-04-06 16:55:21.655641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 16:55:21.656514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-06 16:55:21.656961: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-06 16:55:21.658475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-04-06 16:55:21.658762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-04-06 16:55:21.658948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-04-06 16:55:21.659137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-04-06 16:55:21.659369: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-04-06 16:55:21.659558: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-04-06 16:55:21.659751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-04-06 16:55:21.659944: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 16:55:21.660801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-06 16:55:22.319308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-06 16:55:22.319530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-04-06 16:55:22.319678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-04-06 16:55:22.321037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8685 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
Same problem here with exactly the same configuaration as @aminemayouf
@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now
Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10
Thanks for your reply. However, downgrading TF version is not an option for me in this context...
@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now
Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10
Thanks for your reply. However, downgrading TF version is not an option for me in this context...
@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10
Yea, I still have issues again. My project required me to upgrade to Tensorflow 2.0. When I did that, the error came up again. Config: TF2.0, Cuda 10.1, Cudnn 7.6.4.38
I have the same issue Config: TF2.0, Cuda 10.1, Cudnn 7.6.4.38
Guys if you want a simple object detection process that can be easily installed and run on video feed :
Hope it helps 😃
Same problem with TF1.15. Could anyone fix the problem? Downgrading TF to 1.14 solve the problem.
I have similar problem. I was using tensorflow 2.1 with CUDA 10.1 and cuDNN 7.6 and it was working fine besides few cases when it was working painfully slow. I was getting the "relying on driver to perform ptx compilation" message and gpu usage was sitting on 0% but gpu memory was full. I tried downgrading to tensorflow 2.0 and CUDA 10.0 as this config seems to work as @dmoreyes suggested. Still getting the same message and performance is still awful in same places as before. I'm going to double-check if I have correct versions of everything, if it doesn't help I don't know what's left
So I checked the GPU usage in Windows, apparently, the Cuda section runs at 97% during runtime for me. Im showing the section for clarity (sorry in advance for bad markup)
I am also experiencing this same error under Windows 10 and TF 2.
2020-05-06 10:33:05.368044: W tensorflow/core/common_runtime/shape_refiner.cc:89] Function instantiation has undefined input shape at index: 1211 in the outer inference context. 2020-05-06 10:33:06.357323: W tensorflow/core/common_runtime/shape_refiner.cc:89] Function instantiation has undefined input shape at index: 1211 in the outer inference context. 2020-05-06 10:33:08.729475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2020-05-06 10:33:16.719080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-05-06 10:33:18.201877: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 19043/Unknown - 1045s 55ms/step - loss: 0.3200 - accuracy: 0.8637
Also experiencing this issue. Windows 10, TF 2.2.0
GPU memory gets used, but looks like all calculation is running on CPU with seldom spikes on GPU Core.
Windows 10 Tensorflow 2.2.0 Cuda 10.2 cuDNN10.2 GeForce RTX 1050
2020-05-24 00:13:11.327144: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once.
2020-05-24 00:13:34.932036: F tensorflow/stream_executor/cuda/cudadnn.cc:534] Check failed: cudnnSetTensorNdDescriptor(handle.get(), elem_type, nd, dims.data(), strides.data()) == CUDNN_STATUS_SUCCESS (3 vs. 0)batch_descriptor: {count: 1 feature_map_count: 288 spatial: 0 7 value_min: 0.000000 value_max: 0.000000 layout: BatchYXDepth}
请求帮助。
System information
What is the top-level directory of the model you are using:
\models\research\object_detection\
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO, trying to use model_main.py
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): v1.15.0-rc2-10-g38ea9bbfea 1.15.0-rc3
Bazel version (if compiling from source): N/A
CUDA/cuDNN version: CUDA Version 10.0.130 cuDNN: 7.6.4.38
GPU model and memory: GeForce RTX 2080 SUPER. 8 GB dedicated, 32 GB shared
Exact command to reproduce: From within an Anaconda environment:
python model_main.py --alsologtostderr --model_dir=training/trial_1/ --pipeline_config_path=training/trial_1/faster_rcnn_nas_coco.config
Describe the problem
Hangs on a
message. Sits there forever. Sometimes (usually after restarting the terminal and clearing out any produced files like .ckpt and .pbtxt ) it gets passed this point and soon after crashes with an out of memory problem. Mentioning that because I don't know if its related or not.
Source code / logs