Closed GaussD closed 3 years ago
can you check if the files in /home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/object_detection/model_lib_v2.py
and
https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py
are the same or not?
It is likely that your eval is failing due to other errors. Look for this line in the log message
https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py#L912
'Encountered %s exception.
I have the same problem with the CenterNet Model. But I do not encounter this problem when evaluating the Faster RCNN model.
Traceback (most recent call last):
File "/content/models/research/object_detection/model_main_tf2.py", line 113, in
can you check if the files in /home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/object_detection/model_lib_v2.py
and
https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py
are the same or not?
Yes they are the same.
It is likely that your eval is failing due to other errors. Look for this line in the log message https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py#L912
'Encountered %s exception.
I found that the evaluation process fails if model_main_tf2.py is executed while the training is running (maybe due to the exhaustion of GPU resources). I don't know why the eval has been separated from the training in tensorflow2... In retrospect I would say that I prefer how it used to be in tensorflow1.
I'm also seeing an error during centernet evaluation on coco dataset:
ValueError: Tensor's shape (3, 3, 64, 256) is not compatible with supplied shape (1, 1, 64, 90)
exception.
before the error message TypeError: 'NoneType' object is not iterable
appears in my log. The other models I tested, ssd mobilenet v2 and efficientdet, work fine.
What is the path you have specified for fine_tune_checkpoint
and model_dir
?
@vighneshbirodkar fine_tune_checkpoint path : ...modelzoo/centernet_hg104_1024x1024_coco17_tpu-32/checkpoint/ckpt-0 model_dir : is the file in which all ckpt are written
Can you tell what the model_dir was when you launched the code ? And also share your full config file.
This error can occur due to model_dir and fine_tune_checkpoint_path pointing to the same directory.
my pipeline_config : model {
center_net {
num_classes: 5
feature_extractor {
type: "hourglass_104"
channel_means: 104.01362
channel_means: 114.034225
channel_means: 119.916595
channel_stds: 73.60277
channel_stds: 69.89082
channel_stds: 70.91508
bgr_ordering: true
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1024
max_dimension: 1024
pad_to_max_dimension: true
}
}
object_detection_task {
task_loss_weight: 1.0
offset_loss_weight: 1.0
scale_loss_weight: 0.1
localization_loss {
l1_localization_loss {
}
}
}
object_center_params {
object_center_loss_weight: 1.0
classification_loss {
penalty_reduced_logistic_focal_loss {
alpha: 2.0
beta: 4.0
}
}
min_box_overlap_iou: 0.7
max_box_predictions: 100
}
}
}
train_config {
batch_size: 8
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_adjust_hue {
max_delta : 0.05
}
}
data_augmentation_options {
random_adjust_contrast {
min_delta : 0.8
max_delta : 3
}
}
data_augmentation_options {
random_adjust_saturation {
min_delta : 0.3
max_delta : 1.5
}
}
data_augmentation_options {
random_adjust_brightness {
max_delta : 0.2
}
}
data_augmentation_options {
random_vertical_flip {
}
}
optimizer {
adam_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.001
total_steps: 50000
warmup_learning_rate: 0.00025
warmup_steps: 5000
}
}
epsilon: 1e-07
}
use_moving_average: false
}
fine_tune_checkpoint: "/home/modelzoo/centernet_hg104_1024x1024_coco17_tpu-32/checkpoint/ckpt-0"
num_steps: 15000
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "fine_tune"
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "/home/temp/trainings/centernet_1024x1024_v5/labelmap_v2.pbtxt"
tf_record_input_reader {
input_path: "home/dataset_tr/*.tfrec"
}
}
eval_config {
metrics_set: "oid_V2_detection_metrics"
use_moving_averages: false
batch_size: 1
}
eval_input_reader {
label_map_path: "/home/temp/trainings/centernet_1024x1024_v5/labelmap_v2.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "home/dataset_ev/*.tfrec"
}
}
my model_dir path : /home/temp/trainings/centernet_1024x1024_v5
my eval code: !python /content/models/research/object_detection/model_main_tf2.py \ --pipeline_config_path=$config_path \ --model_dir=$model_dir \ --checkpoint_dir=$model_dir
@vighneshbirodkar when i check the two show different directories. This code was running before the last commit.
The comment about model_dir and fine_tune_checkpoint was directed to @masahi who got an error due to mismatching shapes.
@MehmetBicici Can you share your full logs ?
Evaluation of images in custom tfrecord with shape 1024 x 1024 x 3 This tfrecord works on FasterRCNN evaluation.
INFO:tensorflow:Encountered in user code:
/opt/conda/lib/python3.7/site-packages/object_detection/model_lib_v2.py:884 compute_eval_dict *
losses_dict, prediction_dict = _compute_losses_and_predictions_dicts(
/opt/conda/lib/python3.7/site-packages/object_detection/model_lib_v2.py:118 _compute_losses_and_predictions_dicts *
prediction_dict = model.predict(
/opt/conda/lib/python3.7/site-packages/object_detection/meta_architectures/center_net_meta_arch.py:3297 predict *
predictions[head_name] = [
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1012 _call_ **
outputs = call_fn(inputs, args, *kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/sequential.py:389 call
outputs = layer(inputs, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1008 _call_
self._maybe_build(inputs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2710 _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py:205 build
dtype=self.dtype)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:639 add_weight
caching_device=caching_device)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:810 _add_variable_with_custom_getter
**kwargs_for_getter)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py:142 make_variable
shape=variable_shape if variable_shape else None)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:260 _call_
return cls._variable_v1_call(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
shape=shape)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3332 creator
return next_creator(**kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3332 creator
return next_creator(**kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:714 variable_capturing_scope
lifted_initializer_graph=lifted_initializer_graph, **kwds)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:264 _call_
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:227 _init_
initial_value = initial_value()
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:82 _call_
self._checkpoint_position, shape, shard_info=shard_info)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:117 _init_
self.wrapped_value.set_shape(shape)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1217 set_shape
(self.shape, shape))
ValueError: Tensor's shape (3, 3, 256, 256) is not compatible with supplied shape (1, 1, 256, 9)
exception.
INFO:tensorflow:Encountered in user code:
/opt/conda/lib/python3.7/site-packages/object_detection/model_lib_v2.py:884 compute_eval_dict *
losses_dict, prediction_dict = _compute_losses_and_predictions_dicts(
/opt/conda/lib/python3.7/site-packages/object_detection/model_lib_v2.py:118 _compute_losses_and_predictions_dicts *
prediction_dict = model.predict(
/opt/conda/lib/python3.7/site-packages/object_detection/meta_architectures/center_net_meta_arch.py:3297 predict *
predictions[head_name] = [
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1012 _call_ **
outputs = call_fn(inputs, args, *kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/sequential.py:389 call
outputs = layer(inputs, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1008 _call_
self._maybe_build(inputs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2710 _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py:205 build
dtype=self.dtype)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:639 add_weight
caching_device=caching_device)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:810 _add_variable_with_custom_getter
**kwargs_for_getter)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py:142 make_variable
shape=variable_shape if variable_shape else None)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:260 _call_
return cls._variable_v1_call(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
shape=shape)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3332 creator
return next_creator(**kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3332 creator
return next_creator(**kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:714 variable_capturing_scope
lifted_initializer_graph=lifted_initializer_graph, **kwds)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:264 _call_
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:227 _init_
initial_value = initial_value()
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:82 _call_
self._checkpoint_position, shape, shard_info=shard_info)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:117 _init_
self.wrapped_value.set_shape(shape)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1217 set_shape
(self.shape, shape))
ValueError: Tensor's shape (3, 3, 256, 256) is not compatible with supplied shape (1, 1, 256, 9)
exception.
INFO:tensorflow:A replica probably exhausted all examples. Skipping pending examples on other replicas.
INFO:tensorflow:A replica probably exhausted all examples. Skipping pending examples on other replicas.
'NoneType' object is not iterable```
What is the path you have specified for
fine_tune_checkpoint
andmodel_dir
?
@vighneshbirodkar I just downloaded a trained centernet model from tf2 model zoo (model link), updated eval_input_reader
to point to my local coco validation tfrecord, and tried evaluating. So fine_tune_checkpoint
is empty and model_dir
points to the downloaded model directory (centernet_mobilenetv2_fpn_od
containing checkpoint
, saved_model
, and pipeline.config
).
The above steps worked for efficientdet and ssd mobilenet v2 fpn and I got expected accuracy numbers. I tried the same steps for centernet but got the error above.
@masahi @batuhan-uraltelekom Could you share the full log (and not just the error traceback) and the config file you are using ?
@vighneshbirodkar
This is the command I use for evaluation, along with the log and the pipeline config. The exact error is different for each variant of centernet (ValueError: Tensor's shape (256,) is not compatible with supplied shape (90,)
or ValueError: Tensor's shape (3, 3, 64, 256) is not compatible with supplied shape (1, 1, 64, 90) exception
etc).
MODEL=/home/masa/centernet_mobilenetv2_fpn_od
PIPELINE_CONFIG_PATH=${MODEL}/pipeline.config
MODEL_DIR=${MODEL}/checkpoint
CHECKPOINT_DIR=${MODEL_DIR}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--checkpoint_dir=${CHECKPOINT_DIR} \
--alsologtostderr
pipeline.config
https://gist.github.com/masahi/e99fe9db0d96fe2cff6054e585850c4e@masahi Please sync after this commit https://github.com/tensorflow/models/commit/d7a784e6a85528292abf960be3a7cc643fa2b02c
And try again. We fixed a bug with the CenterNet model recently.
I am closing this for now, but if others have issues after syncing to the commit above, please re-open this. Make sure to attach the full log and the config file.
@masahi Please sync after this commit d7a784e
And try again. We fixed a bug with the CenterNet model recently.
I am closing this for now, but if others have issues after syncing to the commit above, please re-open this. Make sure to attach the full log and the config file.
hmm even after I sync-ed to the latest master
, I still get the exact same error. I'll wait for others to try the suggested fix.
@masahi Is the error log exactly the same or are there any differences ?
@vighneshbirodkar can you explain the d7a784e what exactly is being done there?
@vighneshbirodkar sorry I forgot to do pip install
the package again after I pull the latest change, will report back ASAP
@vighneshbirodkar https://github.com/tensorflow/models/commit/d7a784e6a85528292abf960be3a7cc643fa2b02c does fix the issue. Thanks!
@vighneshbirodkar thank you so much.
In the next step, Guys your models may not evaluate the cause of the exception of upsampling_interpolation.
In this commit 'upsampling_interpolation' key-value expected from _build_center_net_feature_extractor.
Updated the model builder and feature extractor such that the upsampling
Add it if it doesn't exist in your config file.
@nyeroglu Can you elaborate a bit ? What was the error you were facing ?
@vighneshbirodkar Sure, I catch an exception which was exception type is attribute error and value's "upsampling_interpolation".
`def _build_center_net_feature_extractor(feature_extractor_config, is_training):
....
kwargs = {
'channel_means':
list(feature_extractor_config.channel_means),
'channel_stds':
list(feature_extractor_config.channel_stds),
'bgr_ordering':
feature_extractor_config.bgr_ordering,
'depth_multiplier':
feature_extractor_config.depth_multiplier,
'use_separable_conv':
use_separable_conv,
'upsampling_interpolation':
feature_extractor_config.upsampling_interpolation,
}`
Our center_net model's config files not contains this -> Key-Value ('upsampling_interpolation':'bilinear').
This requirement came with this commit (commit e7c5774340623fa9c2687e05fd9990654656a3df). (5 may 2021)
In summary, I'm not sure if it's a true statement, but it breaks back- compatibility. if you want to evaluate a model and calculate the mean mAP scores and this key is not in config, you will get an error.
The exception message was not revealing. I wanted it to show up on calls if someone encountered this problem.
Yigit, Best regards
The commit added a default value ("nearest") for the field. https://github.com/tensorflow/models/blob/master/research/object_detection/protos/center_net.proto#L454
The problem maybe happening because you pulled the latest changes but did not compile the protocol buffers again with the protoc
command.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md#python-package-installation
@vighneshbirodkar I am getting the same error while doing the evaluation on COLAB.
Training works fine with the same config file:
!python model_main_tf2.py --model_dir=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ --pipeline_config_path=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/pipeline_vedai2.config
Now when I evaluated using the command:
!python model_main_tf2.py --model_dir=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ --pipeline_config_path=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/pipeline_vedai2.config --checkpoint_dir=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/
I get the following error:
Instructions for updating: Use
tf.castinstead. INFO:tensorflow:Waiting for new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ I0723 12:18:44.950616 139920949270400 checkpoint_utils.py:140] Waiting for new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ INFO:tensorflow:Found new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ckpt-7 I0723 12:18:45.369537 139920949270400 checkpoint_utils.py:149] Found new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/vedai2/ckpt-7 /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/backend.py:435: UserWarning:
tf.keras.backend.set_learning_phaseis deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the
trainingargument of the
callmethod of your layer or model. warnings.warn('
tf.keras.backend.set_learning_phaseis deprecated and ' 2021-07-23 12:18:48.162754: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2021-07-23 12:18:48.163266: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299995000 Hz Traceback (most recent call last): File "model_main_tf2.py", line 113, in <module> tf.compat.v1.app.run() File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 88, in main wait_interval=300, timeout=FLAGS.eval_timeout) File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 1157, in eval_continuously global_step=global_step, File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 1001, in eager_eval_loop for evaluator in evaluators: TypeError: 'NoneType' object is not iterable
My config file:
model { ssd { num_classes: 9 image_resizer { fixed_shape_resizer { height: 300 width: 300 } } feature_extractor { type: "ssd_mobilenet_v2_keras" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { truncated_normal_initializer { mean: 0.0 stddev: 0.029999999329447746 } } activation: RELU_6 batch_norm { decay: 0.9700000286102295 center: true scale: true epsilon: 0.0010000000474974513 train: true } } override_base_feature_extractor_hyperparams: true } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } box_predictor { convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.009999999776482582 } } activation: RELU_6 batch_norm { decay: 0.9700000286102295 center: true scale: true epsilon: 0.0010000000474974513 train: true } } min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.800000011920929 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false class_prediction_bias_init: -4.599999904632568 } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.20000000298023224 max_scale: 0.949999988079071 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.33329999446868896 } } post_processing { batch_non_max_suppression { score_threshold: 9.99999993922529e-09 iou_threshold: 0.6000000238418579 max_detections_per_class: 100 max_total_detections: 100 use_static_shapes: false } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { delta: 1.0 } } classification_loss { weighted_sigmoid_focal { gamma: 2.0 alpha: 0.75 } } classification_weight: 1.0 localization_weight: 1.0 } encode_background_as_zeros: true normalize_loc_loss_by_codesize: true inplace_batchnorm_update: true freeze_batchnorm: false } } train_config { batch_size: 32 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } sync_replicas: true optimizer { momentum_optimizer { learning_rate { cosine_decay_learning_rate { learning_rate_base: 0.800000011920929 total_steps: 50000 warmup_learning_rate: 0.13333000242710114 warmup_steps: 2000 } } momentum_optimizer_value: 0.8999999761581421 } use_moving_average: false } fine_tune_checkpoint: "/content/drive/MyDrive/Object_detection/tfod/training_demo/pre-trained_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0" num_steps: 30000 startup_delay_steps: 0.0 replicas_to_aggregate: 8 max_number_of_boxes: 100 unpad_groundtruth_tensors: false fine_tune_checkpoint_type: "detection" fine_tune_checkpoint_version: V2 } train_input_reader { label_map_path: "/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/label_map_vedai2.pbtxt" tf_record_input_reader { input_path: "/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/train_vedai2.record" } } eval_config { metrics_set: "coco_detection_metrics" use_moving_averages: false } eval_input_reader { label_map_path: "/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/label_map_vedai2.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/test_vedai2.record" } }
@Chitti21 Can you re-run your code with the latest commit ?
The error usually means that there is a different underlying issue. With the latest code, the underlying issue should be logged.
@vighneshbirodkar I re ran the training by downloading the repo once again (#!git clone https://github.com/tensorflow/models.git). Training works fine. However, I get the same error in the evaluation.
Could there be any other reason?
@Chitti21 Can you post the log with the latest code ? Note that you need to re install the package after cloning a new version.
@vighneshbirodkar
Command:
!python model_main_tf2.py --model_dir=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2 --pipeline_config_path=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/pipeline.config --checkpoint_dir=/content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/
Error Message:
2021-07-25 18:42:39.923354: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0725 18:42:42.906419 140014892648320 model_lib_v2.py:1082] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0725 18:42:42.906693 140014892648320 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0725 18:42:42.906780 140014892648320 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0725 18:42:42.906860 140014892648320 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
W0725 18:42:42.906972 140014892648320 model_lib_v2.py:1103] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
2021-07-25 18:42:42.911214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-25 18:42:42.941359: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 18:42:42.942078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-07-25 18:42:42.942127: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-25 18:42:42.948049: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-25 18:42:42.948136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-25 18:42:42.956197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-25 18:42:42.956611: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-25 18:42:42.956734: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2021-07-25 18:42:42.957351: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-25 18:42:42.957569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-25 18:42:42.957598: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-07-25 18:42:42.957909: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-25 18:42:42.958050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-25 18:42:42.958076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]
INFO:tensorflow:Reading unweighted datasets: ['/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/test.record']
I0725 18:42:43.387483 140014892648320 dataset_builder.py:163] Reading unweighted datasets: ['/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/test.record']
INFO:tensorflow:Reading record datasets for input file: ['/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/test.record']
I0725 18:42:43.387977 140014892648320 dataset_builder.py:80] Reading record datasets for input file: ['/content/drive/MyDrive/Object_detection/tfod/training_demo/annotations/test.record']
INFO:tensorflow:Number of filenames to read: 1
I0725 18:42:43.388109 140014892648320 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0725 18:42:43.388194 140014892648320 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.experimental_deterministic
.
W0725 18:42:43.389816 140014892648320 deprecation.py:336] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.experimental_deterministic
.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map() W0725 18:42:43.408224 140014892648320 deprecation.py:336] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use
tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
W0725 18:42:47.058952 140014892648320 deprecation.py:336] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
W0725 18:42:48.139394 140014892648320 deprecation.py:336] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
INFO:tensorflow:Waiting for new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/
I0725 18:42:50.704592 140014892648320 checkpoint_utils.py:140] Waiting for new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/
INFO:tensorflow:Found new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/ckpt-5
I0725 18:42:50.944287 140014892648320 checkpoint_utils.py:149] Found new checkpoint at /content/drive/MyDrive/Object_detection/tfod/training_demo/models/my_ssd_mobilenet_v2/ckpt-5
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/backend.py:435: UserWarning: tf.keras.backend.set_learning_phase
is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training
argument of the __call__
method of your layer or model.
warnings.warn('tf.keras.backend.set_learning_phase
is deprecated and '
2021-07-25 18:42:52.597620: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-07-25 18:42:52.598130: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2199995000 Hz
Traceback (most recent call last):
File "model_main_tf2.py", line 113, in
Hi @Chitti21 , are you sure that you have installed the latest version of the API ? I am looking for these 2 lines in the logs but I couldn't find them.
https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py#L934
@vighneshbirodkar Yes, I cloned the repo afresh on a new google drive account. No problem happens with the training. But metric evaluation causes the trouble.
I currently don't know what is causing this, I will re-open this for now so this is on our radar to fix.
@vighneshbirodkar Sorry for the trouble. Just now figured out that my test.record file was corrupted. With the new test.record, the evaluation works fine. This issue may pls be closed.
Any help on 10133 would be more useful. Thank you.
Hello, can I request for this issue to be reopened? I was training ResNet50 and it went fine but when I started to evaluate the model I got this error. Please help, huhuhu.
File "Tensorflow/models/research/object_detection/model_main_tf2.py", line 115, in
@aahlexxx You can re-open it. Could you attach the full log from start to finish?
uwu, nevermind. I already figured it out. For anyone who will encounter this error in the future, the reason is the testing set. I didn't labelled my test set since I thought you only need to label the train set. lol
Hi, Can anyone help I'm getting issues trying to train my ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 model? it trains fine but when I try to run the evaluation I get this.
python Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config --checkpoint_dir=Tensorflow\workspace\models\my_ssd_mobnet
2022-02-10 01:58:45.143166: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-02-10 01:58:45.143420: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-10 01:58:50.182484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-02-10 01:58:50.185251: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2022-02-10 01:58:50.187883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2022-02-10 01:58:50.194836: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2022-02-10 01:58:50.197395: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2022-02-10 01:58:50.200059: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-02-10 01:58:50.201928: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0210 01:58:50.207223 16892 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0210 01:58:50.208192 16892 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0210 01:58:50.210210 16892 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0210 01:58:50.211194 16892 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
W0210 01:58:50.213191 16892 model_lib_v2.py:1107] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
2022-02-10 01:58:50.219052: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:tensorflow:Reading unweighted datasets: ['Tensorflow\workspace\annotations\test.record']
I0210 01:58:50.247222 16892 dataset_builder.py:163] Reading unweighted datasets: ['Tensorflow\workspace\annotations\test.record']
INFO:tensorflow:Reading record datasets for input file: ['Tensorflow\workspace\annotations\test.record']
I0210 01:58:50.249191 16892 dataset_builder.py:80] Reading record datasets for input file: ['Tensorflow\workspace\annotations\test.record']
INFO:tensorflow:Number of filenames to read: 1
I0210 01:58:50.252192 16892 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0210 01:58:50.253192 16892 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
W0210 01:58:50.257192 16892 deprecation.py:341] From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
WARNING:tensorflow:From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map() W0210 01:58:50.287194 16892 deprecation.py:341] From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use
tf.data.Dataset.map()
WARNING:tensorflow:From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\tensorflow\python\util\dispatch.py:1096: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
W0210 01:58:54.731192 16892 deprecation.py:341] From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\tensorflow\python\util\dispatch.py:1096: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
WARNING:tensorflow:From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\tensorflow\python\autograph\impl\api.py:465: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
W0210 01:58:55.900195 16892 deprecation.py:341] From D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\tensorflow\python\autograph\impl\api.py:465: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
INFO:tensorflow:Waiting for new checkpoint at Tensorflow\workspace\models\my_ssd_mobnet
I0210 01:58:58.824405 16892 checkpoint_utils.py:140] Waiting for new checkpoint at Tensorflow\workspace\models\my_ssd_mobnet
INFO:tensorflow:Found new checkpoint at Tensorflow\workspace\models\my_ssd_mobnet\ckpt-6
I0210 01:58:58.872439 16892 checkpoint_utils.py:149] Found new checkpoint at Tensorflow\workspace\models\my_ssd_mobnet\ckpt-6
D:\Environments\TensorflowObjectDetectionAPI\lib\site-packages\keras\backend.py:414: UserWarning: tf.keras.backend.set_learning_phase
is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training
argument of the __call__
method of your layer or model.
warnings.warn('tf.keras.backend.set_learning_phase
is deprecated and '
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\model_main_tf2.py", line 115, in
@T-rex007 Can you pull the latest code and send us the full log again ?
Edit: @vighneshbirodkar any update on this?
Using Google Colab, the repo is freshly cloned, same error.
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0220 23:53:07.220828 140280673650560 model_lib_v2.py:1090] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0220 23:53:07.221056 140280673650560 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0220 23:53:07.221137 140280673650560 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0220 23:53:07.221209 140280673650560 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
W0220 23:53:07.221323 140280673650560 model_lib_v2.py:1111] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
2022-02-20 23:53:08.100522: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
INFO:tensorflow:Reading unweighted datasets: ['Tensorflow/workspace/annotations/test.record']
I0220 23:53:08.127663 140280673650560 dataset_builder.py:163] Reading unweighted datasets: ['Tensorflow/workspace/annotations/test.record']
INFO:tensorflow:Reading record datasets for input file: ['Tensorflow/workspace/annotations/test.record']
I0220 23:53:08.128067 140280673650560 dataset_builder.py:80] Reading record datasets for input file: ['Tensorflow/workspace/annotations/test.record']
INFO:tensorflow:Number of filenames to read: 1
I0220 23:53:08.128156 140280673650560 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0220 23:53:08.128212 140280673650560 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
W0220 23:53:08.130596 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map() W0220 23:53:08.152011 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use
tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
W0220 23:53:12.516227 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
W0220 23:53:14.021603 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
INFO:tensorflow:Waiting for new checkpoint at Tensorflow/workspace/models/my_frcnn_640
I0220 23:53:16.813345 140280673650560 checkpoint_utils.py:136] Waiting for new checkpoint at Tensorflow/workspace/models/my_frcnn_640
INFO:tensorflow:Found new checkpoint at Tensorflow/workspace/models/my_frcnn_640/ckpt-6
I0220 23:53:16.816279 140280673650560 checkpoint_utils.py:145] Found new checkpoint at Tensorflow/workspace/models/my_frcnn_640/ckpt-6
/usr/local/lib/python3.7/dist-packages/keras/backend.py:450: UserWarning: tf.keras.backend.set_learning_phase
is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training
argument of the __call__
method of your layer or model.
warnings.warn('tf.keras.backend.set_learning_phase
is deprecated and '
INFO:tensorflow:depth of additional conv before box predictor: 0
I0220 23:53:29.752399 140280673650560 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:459: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
W0220 23:53:37.415742 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:459: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2
.
W0220 23:53:42.235701 140280673650560 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2
.
2022-02-20 23:53:54.634382: E tensorflow/stream_executor/cuda/cuda_dnn.cc:361] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-02-20 23:53:54.635463: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at conv_ops.cc:1120 : UNIMPLEMENTED: DNN library is not found. INFO:tensorflow:Encountered Graph execution error:
Detected at node 'model/conv1_conv/Conv2D' defined at (most recent call last):
File "Tensorflow/models/research/object_detection/model_main_tf2.py", line 115, in
Detected at node 'model/conv1_conv/Conv2D' defined at (most recent call last):
File "Tensorflow/models/research/object_detection/model_main_tf2.py", line 115, in
@vighneshbirodkar can you check this out please?
@isspid I think the real problem is this
2 root error(s) found.
(0) UNIMPLEMENTED: DNN library is not found.
[[{{node model/conv1_conv/Conv2D}}]]
[[Identity_30/_282]]
(1) UNIMPLEMENTED: DNN library is not found.
[[{{node model/conv1_conv/Conv2D}}]]
This indicated a problem in your drivers and TF installation.
@isspid I think the real problem is this
2 root error(s) found. (0) UNIMPLEMENTED: DNN library is not found. [[{{node model/conv1_conv/Conv2D}}]] [[Identity_30/_282]] (1) UNIMPLEMENTED: DNN library is not found. [[{{node model/conv1_conv/Conv2D}}]]
This indicated a problem in your drivers and TF installation.
I am using Google Colab for this purpose so I am not sure I had any impact on the drivers, also tensorflow is already installed in Google Colab if I am not mistaken.
Hi, I am also facing the same issue while using the EfficientDet model (using Google Colab).
I am using the below code to get the model evaluation:
!python /content/gdrive/MyDrive/content/models/research/object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir={model_dir} \
--checkpoint_dir={model_dir}
These are the paths:
pipeline_file = '/content/gdrive/MyDrive/content/models/research/deploy/pipeline_file.config'
model_dir = '/content/gdrive/MyDrive/content/log_files_barca_bayern'
Log while running the evaluation code:
2022-11-13 16:36:37.304526: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-13 16:36:38.053407: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2022-11-13 16:36:38.053514: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2022-11-13 16:36:38.053532: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W1113 16:36:40.238175 139810192238464 model_lib_v2.py:1090] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I1113 16:36:40.238409 139810192238464 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I1113 16:36:40.238500 139810192238464 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I1113 16:36:40.238580 139810192238464 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
W1113 16:36:40.238686 139810192238464 model_lib_v2.py:1110] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
2022-11-13 16:36:41.088505: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
I1113 16:36:41.109091 139810192238464 ssd_efficientnet_bifpn_feature_extractor.py:146] EfficientDet EfficientNet backbone version: efficientnet-b0
I1113 16:36:41.109263 139810192238464 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num filters: 64
I1113 16:36:41.109331 139810192238464 ssd_efficientnet_bifpn_feature_extractor.py:149] EfficientDet BiFPN num iterations: 3
I1113 16:36:41.112856 139810192238464 efficientnet_model.py:143] round_filter input=32 output=32
I1113 16:36:41.145924 139810192238464 efficientnet_model.py:143] round_filter input=32 output=32
I1113 16:36:41.146051 139810192238464 efficientnet_model.py:143] round_filter input=16 output=16
I1113 16:36:41.218508 139810192238464 efficientnet_model.py:143] round_filter input=16 output=16
I1113 16:36:41.218694 139810192238464 efficientnet_model.py:143] round_filter input=24 output=24
I1113 16:36:41.404295 139810192238464 efficientnet_model.py:143] round_filter input=24 output=24
I1113 16:36:41.404441 139810192238464 efficientnet_model.py:143] round_filter input=40 output=40
I1113 16:36:41.577770 139810192238464 efficientnet_model.py:143] round_filter input=40 output=40
I1113 16:36:41.577946 139810192238464 efficientnet_model.py:143] round_filter input=80 output=80
I1113 16:36:41.833776 139810192238464 efficientnet_model.py:143] round_filter input=80 output=80
I1113 16:36:41.833942 139810192238464 efficientnet_model.py:143] round_filter input=112 output=112
I1113 16:36:42.104938 139810192238464 efficientnet_model.py:143] round_filter input=112 output=112
I1113 16:36:42.105093 139810192238464 efficientnet_model.py:143] round_filter input=192 output=192
I1113 16:36:42.436462 139810192238464 efficientnet_model.py:143] round_filter input=192 output=192
I1113 16:36:42.436625 139810192238464 efficientnet_model.py:143] round_filter input=320 output=320
I1113 16:36:42.518593 139810192238464 efficientnet_model.py:143] round_filter input=1280 output=1280
I1113 16:36:42.559586 139810192238464 efficientnet_model.py:453] Building model efficientnet with params ModelConfig(width_coefficient=1.0, depth_coefficient=1.0, resolution=224, dropout_rate=0.2, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=40, output_filters=80, kernel_size=3, num_repeat=3, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=80, output_filters=112, kernel_size=5, num_repeat=3, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=112, output_filters=192, kernel_size=5, num_repeat=4, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=192, output_filters=320, kernel_size=3, num_repeat=1, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise')), stem_base_filters=32, top_base_filters=1280, activation='simple_swish', batch_norm='default', bn_momentum=0.99, bn_epsilon=0.001, weight_decay=5e-06, drop_connect_rate=0.2, depth_divisor=8, min_depth=None, use_se=True, input_channels=3, num_classes=1000, model_name='efficientnet', rescale_input=False, data_format='channels_last', dtype='float32')
INFO:tensorflow:Reading unweighted datasets: ['/content/gdrive/MyDrive/content/test/teams.tfrecord']
I1113 16:36:42.612493 139810192238464 dataset_builder.py:162] Reading unweighted datasets: ['/content/gdrive/MyDrive/content/test/teams.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/content/gdrive/MyDrive/content/test/teams.tfrecord']
I1113 16:36:42.612948 139810192238464 dataset_builder.py:79] Reading record datasets for input file: ['/content/gdrive/MyDrive/content/test/teams.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I1113 16:36:42.613093 139810192238464 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1113 16:36:42.613175 139810192238464 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
W1113 16:36:42.616040 139810192238464 deprecation.py:356] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map() W1113 16:36:42.630358 139810192238464 deprecation.py:356] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use
tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
W1113 16:36:46.364809 139810192238464 deprecation.py:356] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
W1113 16:36:47.720357 139810192238464 deprecation.py:356] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
INFO:tensorflow:Waiting for new checkpoint at /content/gdrive/MyDrive/content/log_files_barca_bayern
I1113 16:36:50.155207 139810192238464 checkpoint_utils.py:142] Waiting for new checkpoint at /content/gdrive/MyDrive/content/log_files_barca_bayern
INFO:tensorflow:Found new checkpoint at /content/gdrive/MyDrive/content/log_files_barca_bayern/ckpt-11
I1113 16:36:51.766162 139810192238464 checkpoint_utils.py:151] Found new checkpoint at /content/gdrive/MyDrive/content/log_files_barca_bayern/ckpt-11
/usr/local/lib/python3.7/dist-packages/keras/backend.py:452: UserWarning: tf.keras.backend.set_learning_phase
is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training
argument of the __call__
method of your layer or model.
"tf.keras.backend.set_learning_phase
is deprecated and "
Traceback (most recent call last):
File "/content/gdrive/MyDrive/content/models/research/object_detection/model_main_tf2.py", line 114, in
This issue happens only when I am trying to evaluate the model and get the performance metrics. There are no issue when I use the trained to model actually identify the objects using test images. In fact I get the final output with bounding boxes and acceptable performance.
Here is my config file:
model { ssd { inplace_batchnorm_update: true freeze_batchnorm: false num_classes: 5 add_background_class: false box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 3 } } image_resizer { keep_aspect_ratio_resizer { min_dimension: 512 max_dimension: 512 pad_to_max_dimension: true } } box_predictor { weight_shared_convolutional_box_predictor { depth: 64 class_prediction_bias_init: -4.6 conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true decay: 0.99 epsilon: 0.001 } } num_layers_before_predictor: 3 kernel_size: 3 use_depthwise: true } } feature_extractor { type: 'ssd_efficientnet-b0_bifpn_keras' bifpn { min_level: 3 max_level: 7 num_iterations: 3 num_filters: 64 } conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { scale: true, decay: 0.99, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid_focal { alpha: 0.25 gamma: 1.5 } } localization_loss { weighted_smooth_l1 { } } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true normalize_loc_loss_by_codesize: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.5 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }
train_config: { fine_tune_checkpoint: "/content/gdrive/MyDrive/content/models/research/deploy/efficientdet_d0_coco17_tpu-32/checkpoint/ckpt-0" fine_tune_checkpoint_version: V2 fine_tune_checkpoint_type: "detection" batch_size: 16 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 use_bfloat16: true num_steps: 8000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_scale_crop_and_pad_to_square { output_size: 512 scale_min: 0.1 scale_max: 2.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 8e-2 total_steps: 300000 warmup_learning_rate: .001 warmup_steps: 2500 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 unpad_groundtruth_tensors: false }
train_input_reader: { label_map_path: "/content/gdrive/MyDrive/content/train/teams_label_map.pbtxt" tf_record_input_reader { input_path: "/content/gdrive/MyDrive/content/train/teams.tfrecord" } }
eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 16; }
eval_input_reader: { label_map_path: "/content/gdrive/MyDrive/content/train/teams_label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/content/gdrive/MyDrive/content/test/teams.tfrecord" } }
Guys I had same exact problem with exact same error message. I checked my test.record TFRecord file for test set and unexpectedly it had 0KB content inside. Basically it was empty file with no images(Seems like I messed with my versions of TFRecord file for test set when I was setting up project).
Solution: As minimum make sure that you have correct TFRecord files
Hi I'm getting this error when trying to evaluate my SSD model. My tfrecord is not empty as it has around 4.6GB. The code I ran is this:
!python model_main_tf2.py \ --pipeline_config_path=$SSDConfig_fname \ --model_dir=$SSD_PATH+'/training' \ --checkpoint_dir=$SSD_PATH'/training/' \ --alsologtostderr
This is the error I got: File "/home/spolicar/.local/lib/python3.8/site-packages/object_detection/model_lib_v2.py", line 1009, in eager_eval_loop for evaluator in evaluators:
TypeError: 'NoneType' object is not iterable
and here's my entire output:
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0922 11:04:16.559759 22936155457344 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0922 11:04:16.559928 22936155457344 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0922 11:04:16.559984 22936155457344 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0922 11:04:16.560036 22936155457344 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
W0922 11:04:16.560107 22936155457344 model_lib_v2.py:1106] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
INFO:tensorflow:Reading unweighted datasets: ['/data/spolicar/coco/coco_tf_records/coco_val.record']
I0922 11:04:17.266874 22936155457344 dataset_builder.py:162] Reading unweighted datasets: ['/data/spolicar/coco/coco_tf_records/coco_val.record']
INFO:tensorflow:Reading record datasets for input file: ['/data/spolicar/coco/coco_tf_records/coco_val.record']
I0922 11:04:17.267067 22936155457344 dataset_builder.py:79] Reading record datasets for input file: ['/data/spolicar/coco/coco_tf_records/coco_val.record']
INFO:tensorflow:Number of filenames to read: 1
I0922 11:04:17.267128 22936155457344 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0922 11:04:17.267172 22936155457344 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/spolicar/.local/lib/python3.8/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
W0922 11:04:17.270482 22936155457344 deprecation.py:364] From /home/spolicar/.local/lib/python3.8/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.deterministic
.
WARNING:tensorflow:From /home/spolicar/.local/lib/python3.8/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map() W0922 11:04:17.289017 22936155457344 deprecation.py:364] From /home/spolicar/.local/lib/python3.8/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use
tf.data.Dataset.map()
WARNING:tensorflow:From /home/spolicar/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
W0922 11:04:20.651786 22936155457344 deprecation.py:364] From /home/spolicar/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor
and use tf.sparse.to_dense
instead.
WARNING:tensorflow:From /home/spolicar/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
W0922 11:04:21.610407 22936155457344 deprecation.py:364] From /home/spolicar/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast
instead.
INFO:tensorflow:Waiting for new checkpoint at /data/spolicar/coco/training/
I0922 11:04:23.801449 22936155457344 checkpoint_utils.py:168] Waiting for new checkpoint at /data/spolicar/coco/training/
INFO:tensorflow:Found new checkpoint at /data/spolicar/coco/training/ckpt-1001
I0922 11:04:23.802538 22936155457344 checkpoint_utils.py:177] Found new checkpoint at /data/spolicar/coco/training/ckpt-1001
/home/spolicar/.local/lib/python3.8/site-packages/keras/src/backend.py:452: UserWarning: tf.keras.backend.set_learning_phase
is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training
argument of the __call__
method of your layer or model.
warnings.warn(
I0922 11:04:28.662562 22936155457344 api.py:460] feature_map_spatial_dims: [(40, 40), (20, 20), (10, 10), (5, 5), (3, 3)]
I0922 11:04:43.842872 22936155457344 api.py:460] feature_map_spatial_dims: [(40, 40), (20, 20), (10, 10), (5, 5), (3, 3)]
2023-09-22 11:04:52.768513: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:437] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-22 11:04:52.768594: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:441] Memory usage: 2162688 bytes free, 23793106944 bytes total.
2023-09-22 11:04:52.768638: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:451] Possibly insufficient driver version: 525.105.17
2023-09-22 11:04:52.768673: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at conv_ops_impl.h:770 : UNIMPLEMENTED: DNN library is not found.
INFO:tensorflow:Encountered Graph execution error:
Detected at node 'ssd_mobile_net_v2_fpn_keras_feature_extractor/model/Conv1/Conv2D' defined at (most recent call last):
File "model_main_tf2.py", line 114, in
Detected at node 'ssd_mobile_net_v2_fpn_keras_feature_extractor/model/Conv1/Conv2D' defined at (most recent call last):
File "model_main_tf2.py", line 114, in
TypeError: 'NoneType' object is not iterable
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py
2. Describe the bug
can't use model_main_tf2.py to evaluate the model. Whereas training runs without errors. Log of the error in section 5
3. Steps to reproduce
I'm following the tutorial https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html and trying ro evaluate the model with:
python ../object_detection/models/research/object_detection/model_main_tf2.py --model_dir=checkpoints/ --pipeline_config_path=mask_rcnn_inception_resnet_v2_1024x1024.config --checkpoint_dir=checkpoints/ --sample_1_of_n_eval=1 --also_log_to_stderr
4. Expected behavior
I expected model_main_tf2.py to output tf event files.
5. Additional context
pre-trained model: Mask R-CNN Inception ResNet V2 1024x1024 from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
File "../object_detection/models/research/object_detection/model_main_tf2.py", line 113, in <module> tf.compat.v1.app.run() File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/gpu_cuda11.0/lib/python3.7/site-packages/absl/app.py", line 303, in run _run_main(main, args) File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/gpu_cuda11.0/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "../object_detection/models/research/object_detection/model_main_tf2.py", line 88, in main wait_interval=300, timeout=FLAGS.eval_timeout) File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/object_detection/model_lib_v2.py", line 1135, in eval_continuously global_step=global_step, File "/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/object_detection/model_lib_v2.py", line 979, in eager_eval_loop for evaluator in evaluators: TypeError: 'NoneType' object is not iterable
6. System information