tensorflow / models

Models and examples built with TensorFlow
Other
77k stars 45.78k forks source link

Invoking ptxas not supported on Windows #7640

Open edwardHujber opened 4 years ago

edwardHujber commented 4 years ago

System information

Describe the problem

Hangs on a

W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

message. Sits there forever. Sometimes (usually after restarting the terminal and clearing out any produced files like .ckpt and .pbtxt ) it gets passed this point and soon after crashes with an out of memory problem. Mentioning that because I don't know if its related or not.

Source code / logs

(tensorflow) F:\Hujber\TensorFlow\workspace\wormLearn>python model_main.py --alsologtostderr --model_dir=training/trial_1/ --pipeline_config_path=training/trial_1/faster_rcnn_nas_coco.config
2019-10-09 23:43:04.866391: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\slim\nets\inception_resnet_v2.py:373: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\slim\nets\mobilenet\mobilenet.py:389: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From model_main.py:109: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\config_util.py:94: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W1009 23:43:07.285009 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\utils\config_util.py:94: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:573: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W1009 23:43:07.285009 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:573: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W1009 23:43:07.285009 15132 model_lib.py:574] Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\config_util.py:480: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W1009 23:43:07.285009 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\utils\config_util.py:480: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Maybe overwriting train_steps: None
I1009 23:43:07.285009 15132 config_util.py:480] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I1009 23:43:07.285009 15132 config_util.py:480] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I1009 23:43:07.300634 15132 config_util.py:480] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I1009 23:43:07.300634 15132 config_util.py:480] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I1009 23:43:07.300634 15132 config_util.py:490] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W1009 23:43:07.316247 15132 model_lib.py:590] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I1009 23:43:07.316247 15132 model_lib.py:623] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': 'training/trial_1/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000260DABBD288>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1009 23:43:07.331873 15132 estimator.py:212] Using config: {'_model_dir': 'training/trial_1/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000260DABBD288>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x00000260DABB5708>) includes params argument, but params are not passed to Estimator.
W1009 23:43:07.331873 15132 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x00000260DABB5708>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I1009 23:43:07.331873 15132 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I1009 23:43:07.331873 15132 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I1009 23:43:07.347513 15132 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\training\training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1009 23:43:07.363146 15132 deprecation.py:323] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\training\training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\data_decoders\tf_example_decoder.py:167: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W1009 23:43:07.363146 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\data_decoders\tf_example_decoder.py:167: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\data_decoders\tf_example_decoder.py:182: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

W1009 23:43:07.363146 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\data_decoders\tf_example_decoder.py:182: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:61: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead.

W1009 23:43:07.378762 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:61: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead.

WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1009 23:43:07.378762 15132 dataset_builder.py:66] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1009 23:43:07.394386 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\data\python\ops\interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1009 23:43:07.394386 15132 deprecation.py:323] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\data\python\ops\interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
2019-10-09 23:43:07.875217: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-09 23:43:08.008609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:41:00.0
2019-10-09 23:43:08.015807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-09 23:43:08.028156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-10-09 23:43:08.034195: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2019-10-09 23:43:08.040674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2019-10-09 23:43:08.047902: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2019-10-09 23:43:08.057279: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2019-10-09 23:43:08.069618: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-09 23:43:08.072360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\anchor_generators\grid_anchor_generator.py:59: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1009 23:43:13.023561 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\anchor_generators\grid_anchor_generator.py:59: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

W1009 23:43:16.071676 15132 module_wrapper.py:139] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:465: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1009 23:43:16.149807 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:465: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:468: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1009 23:43:16.149807 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:468: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W1009 23:43:17.891674 15132 module_wrapper.py:139] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W1009 23:43:19.450865 15132 module_wrapper.py:139] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

W1009 23:43:19.466491 15132 module_wrapper.py:139] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.string_to_hash_bucket_fast is deprecated. Please use tf.strings.to_hash_bucket_fast instead.

W1009 23:43:21.049735 15132 module_wrapper.py:139] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.string_to_hash_bucket_fast is deprecated. Please use tf.strings.to_hash_bucket_fast instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
W1009 23:43:21.471656 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\builders\dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
INFO:tensorflow:Calling model_fn.
I1009 23:43:21.487268 15132 estimator.py:1148] Calling model_fn.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:162: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1009 23:43:21.502909 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:162: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2019-10-09 23:43:21.512084: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-09 23:43:21.529066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:41:00.0
2019-10-09 23:43:21.540071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-09 23:43:21.544621: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-10-09 23:43:21.547872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2019-10-09 23:43:21.553238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2019-10-09 23:43:21.557124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2019-10-09 23:43:21.565002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2019-10-09 23:43:21.568175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-09 23:43:21.572306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-09 23:43:22.169167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-09 23:43:22.172834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-09 23:43:22.176589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-09 23:43:22.182536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 6269 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:41:00.0, compute capability: 7.5)
INFO:tensorflow:A GPU is available on the machine, consider using NCHW data format for increased speed on GPU.
I1009 23:43:22.178170 15132 nasnet.py:408] A GPU is available on the machine, consider using NCHW data format for increased speed on GPU.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1009 23:43:22.178170 15132 deprecation.py:323] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\slim\nets\nasnet\nasnet_utils.py:459: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W1009 23:43:22.287549 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\slim\nets\nasnet\nasnet_utils.py:459: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\core\anchor_generator.py:149: The name tf.assert_equal is deprecated. Please use tf.compat.v1.assert_equal instead.

W1009 23:43:29.248443 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\core\anchor_generator.py:149: The name tf.assert_equal is deprecated. Please use tf.compat.v1.assert_equal instead.

INFO:tensorflow:Scale of 0 disables regularizer.
I1009 23:43:29.248443 15132 regularizers.py:98] Scale of 0 disables regularizer.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:986: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

W1009 23:43:29.248443 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:986: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

INFO:tensorflow:Scale of 0 disables regularizer.
I1009 23:43:29.264083 15132 regularizers.py:98] Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
I1009 23:43:29.264083 15132 convolutional_box_predictor.py:148] depth of additional conv before box predictor: 0
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\box_coders\faster_rcnn_box_coder.py:82: The name tf.log is deprecated. Please use tf.math.log instead.

W1009 23:43:29.560967 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\box_coders\faster_rcnn_box_coder.py:82: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\core\minibatch_sampler.py:81: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

W1009 23:43:29.592233 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\core\minibatch_sampler.py:81: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:1085: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
W1009 23:43:29.685991 15132 deprecation.py:506] From F:\Hujber\TensorFlow\models\research\object_detection\utils\ops.py:1085: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:185: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

W1009 23:43:29.701617 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:185: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W1009 23:43:32.791057 15132 deprecation.py:323] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I1009 23:43:32.806683 15132 regularizers.py:98] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I1009 23:43:32.822310 15132 regularizers.py:98] Scale of 0 disables regularizer.
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2235: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

W1009 23:43:32.837936 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2235: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2236: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
W1009 23:43:32.837936 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2236: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\utils\variables_helper.py:126: The name tf.train.NewCheckpointReader is deprecated. Please use tf.compat.v1.train.NewCheckpointReader instead.

W1009 23:43:32.853562 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\utils\variables_helper.py:126: The name tf.train.NewCheckpointReader is deprecated. Please use tf.compat.v1.train.NewCheckpointReader instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:317: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

W1009 23:43:32.869188 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:317: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:174: The name tf.losses.huber_loss is deprecated. Please use tf.compat.v1.losses.huber_loss instead.

W1009 23:43:35.818088 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:174: The name tf.losses.huber_loss is deprecated. Please use tf.compat.v1.losses.huber_loss instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:180: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.

W1009 23:43:35.818088 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:180: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

W1009 23:43:35.849340 15132 deprecation.py:323] From F:\Hujber\TensorFlow\models\research\object_detection\core\losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2202: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

W1009 23:43:35.989976 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2202: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\builders\optimizer_builder.py:52: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

W1009 23:43:36.021228 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\builders\optimizer_builder.py:52: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:359: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1009 23:43:36.021228 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:359: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:369: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W1009 23:43:36.021228 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:369: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:472: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W1009 23:43:48.484605 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:472: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:476: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

W1009 23:43:49.816037 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:476: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

WARNING:tensorflow:From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:477: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

W1009 23:43:49.816037 15132 module_wrapper.py:139] From F:\Hujber\TensorFlow\models\research\object_detection\model_lib.py:477: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

INFO:tensorflow:Done calling model_fn.
I1009 23:43:49.831650 15132 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I1009 23:43:49.831650 15132 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1009 23:43:59.346498 15132 monitored_session.py:240] Graph was finalized.
2019-10-09 23:43:59.354769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:41:00.0
2019-10-09 23:43:59.368323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-09 23:43:59.372370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-10-09 23:43:59.376063: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2019-10-09 23:43:59.381228: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2019-10-09 23:43:59.384374: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2019-10-09 23:43:59.389889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2019-10-09 23:43:59.393346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-09 23:43:59.399778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-09 23:43:59.404838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-09 23:43:59.408820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-09 23:43:59.413834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-09 23:43:59.418801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6269 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:41:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from training/trial_1/model.ckpt-0
I1009 23:43:59.424629 15132 saver.py:1284] Restoring parameters from training/trial_1/model.ckpt-0
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W1009 23:44:04.970287 15132 deprecation.py:323] From C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I1009 23:44:07.828599 15132 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1009 23:44:09.031818 15132 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into training/trial_1/model.ckpt.
I1009 23:44:35.653282 15132 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/trial_1/model.ckpt.
2019-10-09 23:45:11.780278: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-10-09 23:45:14.790199: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-09 23:45:16.309576: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
pawarrick commented 4 years ago

With Windows 10, TF 1.15/CUDA 10.0/cuDNN 7.6.4.38 I also get this ptx warning followed eventually by a CUDA OOM error in a cross-validation loop (my own code, not model_main.py). Did not occur with TF 1.12.0/CUDA 9.0/cuDNN 7.3.1.20

Acejoy commented 4 years ago

System information What is the top-level directory of the model you are using: \models\research\object_detection\

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO, trying to use object_detection_tutorial.ipynb

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10

TensorFlow installed from (source or binary): installed using pip(pip install tensorflow-gpu)

TensorFlow version (use command below): v2.0.0

Bazel version (if compiling from source): N/A

CUDA/cuDNN version: CUDA Version 10.0.130 cuDNN: 7.6.4

GPU model and memory: GeForce GTX 1050 4 GB dedicated, 3.9 GB shared

Exact command to reproduce: runt the object_detection_tutorial.ipynb file

Describe the problem and it got stuck at the loop where the image results were meant to be shown i.e.:-

for image_path in TEST_IMAGE_PATHS: show_inference(detection_model, image_path)

. It stayed here until the jupyter notebook dispalyed a message saying kernel has died. When tried to run it in anaconda prompt ,the following was displayed at the end after which the no images were shown and the process ended.

W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once.

Please look into this matter.

TaiChiTiger commented 4 years ago

I got the same error, keras doesn't use gpu

alainkaiser commented 4 years ago

I also got the same error. After the error appears in the console, the kernal is dead and must be restarted. Please provide some information for this issue!

The following line is causing the issue:

output_dict = model(input_tensor)

Icaro-Lima commented 4 years ago

Same problem here.

geometrikal commented 4 years ago

Its weird - I get this error and then model.predict is super slow, but fitting the model is just as fast as normal.

DzakirinMD commented 4 years ago

Is this issue due to CUDA 10 ? i'm having this issue as well

alainkaiser commented 4 years ago

I could resolve this problem by using tensorflow version 1.9. Works as expected now!

geometrikal commented 4 years ago

@Keyrainn Tensorflow 1.14 with CUDA 10.0 works for me

Huii commented 4 years ago

Same problem here on Windows 10 with Keras 2.3.1 and TensorFlow 2.0. Could this somehow be related to this issue?

ghost commented 4 years ago

I'm also having the same issue with TensorFlow 2.0 and Windows 10 while trying to run object_detection_tutorial.ipynb, specifically failing on output_dict = model(input_tensor). I'd prefer not to roll back to v1.9 if possible.

felixdittrich92 commented 4 years ago

same problem any solution ? :) object_detection_tutorial.ipynb doesn´t run

davijo commented 4 years ago

I had the same ptx hang up occasionally in addition to freezing at basic_session_run_hooks.py step = 0. I'm running with TF 1.15 and CUDA 10. I managed to get things up running again by downgrading my NVIDIA drivers to 431.60.

akoutsoukis commented 4 years ago

I fixed it by downgrading tensorflow. Not the best solution but works

simisterio commented 4 years ago

I had the same ptx hang up occasionally in addition to freezing at basic_session_run_hooks.py step = 0. I'm running with TF 1.15 and CUDA 10. I managed to get things up running again by downgrading my NVIDIA drivers to 431.60.

wich cuDNN did you use??

hemantghuge commented 4 years ago

Please solve the issue quickly

hemantghuge commented 4 years ago

I fixed it by downgrading tensorflow. Not the best solution but works

Downgrading till which tensorflow

akoutsoukis commented 4 years ago

I fixed it by downgrading tensorflow. Not the best solution but works

Downgrading till which tensorflow

TF 1.15

hemantghuge commented 4 years ago

Installed TF 1.5 but getting error @akoutsoukis

In [7] model_name = 'ssd_mobilenet_v1_coco_2017_11_17' detection_model = load_model(model_name)

WARNING:tensorflow:From :11: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.

TypeError Traceback (most recent call last)

in 1 model_name = 'ssd_mobilenet_v1_coco_2017_11_17' ----> 2 detection_model = load_model(model_name) in load_model(model_name) 9 model_dir = pathlib.Path(model_dir)/"saved_model" 10 ---> 11 model = tf.saved_model.load(str(model_dir)) 12 model = model.signatures['serving_default'] 13 c:\users\hemant ghuge\anaconda3\envs\tensorflow1bg\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(*args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated', TypeError: load() missing 2 required positional arguments: 'tags' and 'export_dir'
hemantghuge commented 4 years ago

I fixed it by downgrading tensorflow. Not the best solution but works

Downgrading till which tensorflow

TF 1.15

Installed TF 1.5 but getting error @akoutsoukis

In [7] model_name = 'ssd_mobilenet_v1_coco_2017_11_17' detection_model = load_model(model_name)

WARNING:tensorflow:From :11: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. TypeError Traceback (most recent call last) in 1 model_name = 'ssd_mobilenet_v1_coco_2017_11_17' ----> 2 detection_model = load_model(model_name)

in load_model(model_name) 9 model_dir = pathlib.Path(model_dir)/"saved_model" 10 ---> 11 model = tf.saved_model.load(str(model_dir)) 12 model = model.signatures['serving_default'] 13

c:\users\hemant ghuge\anaconda3\envs\tensorflow1bg\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, *kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated',

TypeError: load() missing 2 required positional arguments: 'tags' and 'export_dir'

nayash commented 4 years ago

Getting this same issue. I would really hate to downgrade driver or Tensorflow, especially since, I just upgraded to Tensorflow2.0 and modified my code accordingly. Any solution?

Acejoy commented 4 years ago

Hey,just wanted to ask one thing. Is this problem just happening for windows users or are linux or mac users too facing the same problem? pls reply.

akoutsoukis commented 4 years ago

@Acejoy For me it was in Linux.

joao-carvalheira commented 4 years ago

I'm having this same problem With Windows 10, installed tensorflow-gpu with conda: TF 2.0.0/CUDA 10.0

dolhasz commented 4 years ago

Same issue here, hangs on

2020-01-31 18:29:05.919027: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

Then crashes with no error.

brarrow commented 4 years ago

Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10

pawarrick commented 4 years ago

Ok it’s clear that many have experienced the problem and for many, many months. Can we know where the solution resides? In a fixed NVidia driver? In tensorflow? Thanks

dmoreyes commented 4 years ago

Hi team, I just had the exact same problem on the following configuration :

And solve it by reinstalling (... copying to be more accurate) the correct cuDNN files version.

For any reasons I tried first to install the very latest CUDA (10.1), cuDNN (for CUDA 10.1), Tensorflow (2.1) versions and fall back to the versions mentionned at the beginning of the post because of many problems, but I forgot to also downgrade cuDNN.

Now everything works fine.

Hope this helps Dan.

ablyzniuk commented 4 years ago

@dmoreyes Hi) so what is your current version of TF, CUDA, and cudNN. I have the same issue as you. gtx1050ti, TF 2.0.0. Cuda 10.2, cudNN 10.2.

dmoreyes commented 4 years ago

Hi, Here are the versions I'm using for my Windows 10 Pro x64 OS

Dan

Echocage commented 4 years ago

Hi, Here are the versions I'm using for my Windows 10 Pro x64 OS

Dan

Do you still get the error Invoking ptxas not supported on Windows?

jackyvr commented 4 years ago

Any recommendation or solution for the problem? I am experiencing the same issue. Here is my setup:

Windows 10 CUDA 10.1 TensorFlow 2.0.1 NVidia RTX 2080 Ti

Thanks!

geometrikal commented 4 years ago

I don't know if this is related, but the same time this error started appearing (I didn't get the freeze issue though), training on a Titan X (pascal) became about 10x slower for a simple two layer network. Tensorflow 1.13.1 worked fine, every TF version after that was slow.

I just updated drivers (to 442.19) and while the ptx error is still there, training has resumed normal speed! This is Windows 10, CUDA 10.0, TensorFlow 1.15.2, Titan X (pascal).

artyplexus commented 4 years ago

Windows 10 Tensorflow 2.1.0 Cuda 10.1 cuDNN for CUDA 10.1 (v. 7.6.5.32) GeForce RTX 1060

[INFO] training network...
Epoch 1/75
2020-02-15 14:45:46.794388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-15 14:45:47.071668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-15 14:45:47.998708: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once. 

then crashed without any errors. Updated driver to 442.19. The warning remains, but training start working.

JuanRuiz135 commented 4 years ago

Windows 10 Tensorflow 2.0 Cuda 10.0 Cudnn 7.6.5 for cuda 10.0 GeForce GTX 1050 ti Driver, latest to this date 442.19 I'm still getting this error after having tried many configurations of tensorflow and cuda versions. I'm starting to think it might be an error in the data pipeline as explained here https://stackoverflow.com/questions/58455765/keras-sees-my-gpu-but-doesnt-use-it-when-training-a-neural-network but I'm not really sure of how to use the tf.records to solve this, here's my code https://github.com/JuanDRC/AlzheimerProj/blob/master/FreezeNone.py Epoch 1/100 2020-02-17 11:15:18.577785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2020-02-17 11:15:19.784597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-02-17 11:15:21.886392: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 2020-02-17 11:15:22.281623: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.53GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

artyplexus commented 4 years ago

According to https://www.tensorflow.org/install/source#tested_source_configurations cuDNN for CUDA 10.0 should be 7.4.

aminemayouf commented 4 years ago

Windows 10 Tensorflow 2.1.0 Cuda 10.1 cuDNN 7.6.5 for Cuda 10.1 GeForce RTX 2070 Driver 442.19

Any idea on how to fix this please ? I've also tried Tensorflow 2.0, Cuda 10, cuDNN 7.4 for Cuda 10 And Tensorflow 2.1.0, Cuda 10.2, cuDNN 7.6.5 for Cuda 10.2

2020-02-23 23:32:55.488931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Loading the Tensorflow model into memory
2020-02-23 23:33:02.694777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-23 23:33:02.706990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-23 23:33:02.709086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-23 23:33:02.713757: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-23 23:33:02.717086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-23 23:33:02.718813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-23 23:33:02.722356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-23 23:33:02.724771: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-23 23:33:02.736184: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:02.737495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-23 23:33:02.738601: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-02-23 23:33:02.741882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-23 23:33:02.743964: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-23 23:33:02.745035: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-23 23:33:02.746099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-23 23:33:02.747154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-23 23:33:02.748218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-23 23:33:02.749306: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-23 23:33:02.750383: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:02.751586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-23 23:33:03.104342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-23 23:33:03.105515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-02-23 23:33:03.106198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-02-23 23:33:03.107188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6304 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loading label map
Starting capture
2020-02-23 23:33:15.129453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-23 23:33:15.913596: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-02-23 23:33:15.929984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
JoeHRIsaac commented 4 years ago

I am having the same error but the program runs, Keras 2.3.1 TF 1.15 (GPU version from pip install) CUDA 10.0

I was trying to use the resnet prebuilt model The output comes as expected from variable j

I would like to know if the GPU is utilized by keras as some people above mention that the GPU is not utilized with such error

j = resnet_model.predict(image_batch)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2020-04-06 17:02:12.132870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 17:02:13.473220: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-06 17:02:13.517059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll

The beginning pile of logs looks fine

Using TensorFlow backend.
2020-04-06 16:55:19.036335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
PIL image size (480, 640)
numpy array size (640, 480, 3)
image batch size (1, 640, 480, 3)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-04-06 16:55:21.590580: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-06 16:55:21.622826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-04-06 16:55:21.623141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-04-06 16:55:21.628250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-04-06 16:55:21.631091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-04-06 16:55:21.632380: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-04-06 16:55:21.636182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-04-06 16:55:21.639007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-04-06 16:55:21.655641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 16:55:21.656514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-06 16:55:21.656961: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-06 16:55:21.658475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-04-06 16:55:21.658762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-04-06 16:55:21.658948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-04-06 16:55:21.659137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-04-06 16:55:21.659369: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-04-06 16:55:21.659558: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-04-06 16:55:21.659751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-04-06 16:55:21.659944: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-06 16:55:21.660801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-06 16:55:22.319308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-06 16:55:22.319530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-04-06 16:55:22.319678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-04-06 16:55:22.321037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8685 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From C:\Users\joehr\Anaconda3\envs\ml-agents\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
RobeeF commented 4 years ago

Same problem here with exactly the same configuaration as @aminemayouf

JoeHRIsaac commented 4 years ago

@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now

Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10

RobeeF commented 4 years ago

Thanks for your reply. However, downgrading TF version is not an option for me in this context...

@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now

Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10

JoeHRIsaac commented 4 years ago

Thanks for your reply. However, downgrading TF version is not an option for me in this context...

@greenbarrow Using TF 1.14 with Keras 2.3.1 and Python 3.6.7 works for me now Same issue Windows 10, TF 1.13.1/1.14/1.15.2 CUDA 10

Yea, I still have issues again. My project required me to upgrade to Tensorflow 2.0. When I did that, the error came up again. Config: TF2.0, Cuda 10.1, Cudnn 7.6.4.38

Rahma20 commented 4 years ago

I have the same issue Config: TF2.0, Cuda 10.1, Cudnn 7.6.4.38

IAFutur commented 4 years ago

Guys if you want a simple object detection process that can be easily installed and run on video feed :

Hope it helps 😃

jackyvr commented 4 years ago

Same problem with TF1.15. Could anyone fix the problem? Downgrading TF to 1.14 solve the problem.

MathewP1 commented 4 years ago

I have similar problem. I was using tensorflow 2.1 with CUDA 10.1 and cuDNN 7.6 and it was working fine besides few cases when it was working painfully slow. I was getting the "relying on driver to perform ptx compilation" message and gpu usage was sitting on 0% but gpu memory was full. I tried downgrading to tensorflow 2.0 and CUDA 10.0 as this config seems to work as @dmoreyes suggested. Still getting the same message and performance is still awful in same places as before. I'm going to double-check if I have correct versions of everything, if it doesn't help I don't know what's left

JoeHRIsaac commented 4 years ago

So I checked the GPU usage in Windows, apparently, the Cuda section runs at 97% during runtime for me. Im showing the section for clarity (sorry in advance for bad markup)

image

ibrahimishag commented 4 years ago

I am also experiencing this same error under Windows 10 and TF 2. 2020-05-06 10:33:05.368044: W tensorflow/core/common_runtime/shape_refiner.cc:89] Function instantiation has undefined input shape at index: 1211 in the outer inference context. 2020-05-06 10:33:06.357323: W tensorflow/core/common_runtime/shape_refiner.cc:89] Function instantiation has undefined input shape at index: 1211 in the outer inference context. 2020-05-06 10:33:08.729475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2020-05-06 10:33:16.719080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-05-06 10:33:18.201877: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 19043/Unknown - 1045s 55ms/step - loss: 0.3200 - accuracy: 0.8637

maxenko commented 4 years ago

Also experiencing this issue. Windows 10, TF 2.2.0

GPU memory gets used, but looks like all calculation is running on CPU with seldom spikes on GPU Core.

sysalong commented 4 years ago

Windows 10 Tensorflow 2.2.0 Cuda 10.2 cuDNN10.2 GeForce RTX 1050

2020-05-24 00:13:11.327144: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once.

2020-05-24 00:13:34.932036: F tensorflow/stream_executor/cuda/cudadnn.cc:534] Check failed: cudnnSetTensorNdDescriptor(handle.get(), elem_type, nd, dims.data(), strides.data()) == CUDNN_STATUS_SUCCESS (3 vs. 0)batch_descriptor: {count: 1 feature_map_count: 288 spatial: 0 7 value_min: 0.000000 value_max: 0.000000 layout: BatchYXDepth}

请求帮助。