tensorflow / models

Models and examples built with TensorFlow
Other
77.2k stars 45.75k forks source link

Faster RCNN fails on tensorflow 2.4 #9833

Open morttrager opened 3 years ago

morttrager commented 3 years ago

Hi,

I am trying to train my custom dataset using Faster RCNN(any version) using tensorflow 2.4.1 and python 3.8, but i am facing the following issue

pythin script : model_main_tf2.py (I have also tried with model_lib_v2.py)

2021-03-25 07:05:36.801705: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-03-25 07:05:41.926213: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-25 07:05:41.927406: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-03-25 07:05:41.938885: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2021-03-25 07:05:41.938943: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (82d12746ecd0): /proc/driver/nvidia/version does not exist 2021-03-25 07:05:41.939638: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set WARNING:tensorflow:There are non-GPU devices in tf.distribute.Strategy, not using nccl allreduce. W0325 07:05:41.940942 140308486084480 cross_device_ops.py:1321] There are non-GPU devices in tf.distribute.Strategy, not using nccl allreduce. INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',) I0325 07:05:41.941336 140308486084480 mirrored_strategy.py:350] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',) INFO:tensorflow:Maybe overwriting train_steps: None I0325 07:05:41.947287 140308486084480 config_util.py:552] Maybe overwriting train_steps: None INFO:tensorflow:Maybe overwriting use_bfloat16: False I0325 07:05:41.947620 140308486084480 config_util.py:552] Maybe overwriting use_bfloat16: False Traceback (most recent call last): File "model_main_tf2.py", line 113, in tf.compat.v1.app.run() File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 110, in main record_summaries=FLAGS.record_summaries) File "/content/tf2-object-detection-trainer/tensorflow/models/research/object_detection/model_lib_v2.py", line 516, in train_loop model_config=model_config, is_training=True) File "/content/tf2-object-detection-trainer/tensorflow/models/research/object_detection/builders/model_builder.py", line 1117, in build add_summaries) File "/content/tf2-object-detection-trainer/tensorflow/models/research/object_detection/builders/model_builder.py", line 377, in _build_ssd_model _check_feature_extractor_exists(ssd_config.feature_extractor.type) File "/content/tf2-object-detection-trainer/tensorflow/models/research/object_detection/builders/model_builder.py", line 251, in _check_feature_extractor_exists 'Tensorflow'.format(feature_extractor_type)) ValueError: is not supported. See model_builder.py for features extractors compatible with different versions of Tensorflow.

The training is working with efficientDet and SSD, only faster rcnn is failing.

Kindly help me with this.

Yu-Hang commented 3 years ago

@morttrager Hey man, were you loading the efficientDet and SSD models from the model zoo? Do you mind sharing your code for training with those two models?

morttrager commented 3 years ago

@morttrager Hey man, were you loading the efficientDet and SSD models from the model zoo? Do you mind sharing your code for training with those two models?

Hi @Yu-Hang, this tutorial will help you to do the training using tensorflow 2 models. It is very easy to understand and implement.

morttrager commented 3 years ago

Hi @tombstone , @jch1 and @pkulzc ,

Kindly help me with the issue.

Yu-Hang commented 3 years ago

@morttrager I just found this tutorial too! Thanks