tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.76k forks source link

ValueError: List argument 'values' to 'ConcatV2' Op with length 0 #10947

Closed JunHyungKang closed 1 year ago

JunHyungKang commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/vision/modeling/layers/detection_generator.py

2. Describe the bug

In function '_generate_detections_v2_class_aware', error is occured during evaludation phase as follow: ValueError: List argument 'values' to 'ConcatV2' Op with length 0 shorter than minimum length 2.

3. Steps to reproduce

pass the input with length 0 tensors as predicted boxes

4. Expected behavior

passing this function when the predicted boxes is 0 or 1

5. Additional context

N/A

6. System information

laxmareddyp commented 1 year ago

Hi @JunHyungKang ,

Sorry for delay in response, In order to expedite the trouble-shooting process, please provide a code snippet/colab code to reproduce the issue reported here. Thanks.

JunHyungKang commented 1 year ago

@laxmareddyp Here is my config and detail error.

config

{'runtime': {'all_reduce_alg': None,
             'batchnorm_spatial_persistent': False,
             'dataset_num_private_threads': None,
             'default_shard_dim': -1,
             'distribution_strategy': 'mirrored',
             'enable_xla': False,
             'gpu_thread_mode': None,
             'loss_scale': None,
             'mixed_precision_dtype': None,
             'num_cores_per_replica': 1,
             'num_gpus': 2,
             'num_packs': 1,
             'per_gpu_thread_count': 0,
             'run_eagerly': False,
             'task_index': -1,
             'tpu': None,
             'tpu_enable_xla_dynamic_padder': None,
             'worker_hosts': None},
 'task': {'allow_image_summary': False,
          'annotation_file': None,
          'differential_privacy_config': None,
          'export_config': {'cast_detection_classes_to_float': False,
                            'cast_num_detections_to_float': False,
                            'output_intermediate_features': False,
                            'output_normalized_coordinates': False},
          'freeze_backbone': False,
          'init_checkpoint': None,
          'init_checkpoint_modules': 'all',
          'losses': {'box_loss_weight': 50,
                     'focal_loss_alpha': 0.25,
                     'focal_loss_gamma': 1.5,
                     'huber_loss_delta': 0.1,
                     'l2_weight_decay': 0.0001,
                     'loss_weight': 1.0},
          'max_num_eval_detections': 100,
          'model': {'anchor': {'anchor_size': 4.0,
                               'aspect_ratios': [0.5, 1.0, 2.0],
                               'num_scales': 3},
                    'backbone': {'resnet': {'bn_trainable': True,
                                            'depth_multiplier': 1.0,
                                            'model_id': 50,
                                            'replace_stem_max_pool': False,
                                            'resnetd_shortcut': False,
                                            'scale_stem': True,
                                            'se_ratio': 0.0,
                                            'stem_type': 'v0',
                                            'stochastic_depth_drop_rate': 0.0},
                                 'type': 'resnet'},
                    'decoder': {'fpn': {'fusion_type': 'sum',
                                        'num_filters': 256,
                                        'use_keras_layer': False,
                                        'use_separable_conv': False},
                                'type': 'fpn'},
                    'detection_generator': {'apply_nms': True,
                                            'max_num_detections': 100,
                                            'nms_iou_threshold': 0.5,
                                            'nms_version': 'v2',
                                            'pre_nms_score_threshold': 0.05,
                                            'pre_nms_top_k': 5000,
                                            'return_decoded': None,
                                            'soft_nms_sigma': None,
                                            'tflite_post_processing': {'max_classes_per_detection': 5,
                                                                       'max_detections': 200,
                                                                       'nms_iou_threshold': 0.5,
                                                                       'nms_score_threshold': 0.1,
                                                                       'normalize_anchor_coordinates': False,
                                                                       'use_regular_nms': False},
                                            'use_class_agnostic_nms': False,
                                            'use_cpu_nms': False},
                    'head': {'attribute_heads': [],
                             'num_convs': 4,
                             'num_filters': 256,
                             'share_classification_heads': False,
                             'use_separable_conv': False},
                    'input_size': [1024, 1024, 3],
                    'max_level': 7,
                    'min_level': 3,
                    'norm_activation': {'activation': 'relu',
                                        'norm_epsilon': 0.001,
                                        'norm_momentum': 0.99,
                                        'use_sync_bn': True},
                    'num_classes': 1},
          'name': None,
          'per_category_metrics': False,
          'train_data': {'apply_tf_data_service_before_batching': False,
                         'block_length': 1,
                         'cache': False,
                         'cycle_length': None,
                         'decoder': {'simple_decoder': {'attribute_names': [],
                                                        'mask_binarize_threshold': None,
                                                        'regenerate_source_id': False},
                                     'type': 'simple_decoder'},
                         'deterministic': None,
                         'drop_remainder': True,
                         'dtype': 'float32',
                         'enable_shared_tf_data_service_between_parallel_trainers': False,
                         'enable_tf_data_service': False,
                         'file_type': 'tfrecord',
                         'global_batch_size': 2,
                         'input_path': '/mnt/sda1/exc_cctv/1st/tfrecords/train/*',
                         'is_training': True,
                         'parser': {'aug_policy': None,
                                    'aug_rand_hflip': False,
                                    'aug_scale_max': 1.2,
                                    'aug_scale_min': 0.8,
                                    'aug_type': None,
                                    'match_threshold': 0.5,
                                    'max_num_instances': 100,
                                    'num_channels': 3,
                                    'skip_crowd_during_training': True,
                                    'unmatched_threshold': 0.5},
                         'prefetch_buffer_size': None,
                         'seed': None,
                         'sharding': True,
                         'shuffle_buffer_size': 10000,
                         'tf_data_service_address': None,
                         'tf_data_service_job_name': None,
                         'tfds_as_supervised': False,
                         'tfds_data_dir': '',
                         'tfds_name': '',
                         'tfds_skip_decoding_feature': '',
                         'tfds_split': '',
                         'trainer_id': None,
                         'weights': None},
          'use_coco_metrics': True,
          'use_wod_metrics': False,
          'validation_data': {'apply_tf_data_service_before_batching': False,
                              'block_length': 1,
                              'cache': False,
                              'cycle_length': None,
                              'decoder': {'simple_decoder': {'attribute_names': [],
                                                             'mask_binarize_threshold': None,
                                                             'regenerate_source_id': False},
                                          'type': 'simple_decoder'},
                              'deterministic': None,
                              'drop_remainder': True,
                              'dtype': 'float32',
                              'enable_shared_tf_data_service_between_parallel_trainers': False,
                              'enable_tf_data_service': False,
                              'file_type': 'tfrecord',
                              'global_batch_size': 2,
                              'input_path': '/mnt/sda1/exc_cctv/1st/tfrecords/val/*',
                              'is_training': False,
                              'parser': {'aug_policy': None,
                                         'aug_rand_hflip': False,
                                         'aug_scale_max': 1.0,
                                         'aug_scale_min': 1.0,
                                         'aug_type': None,
                                         'match_threshold': 0.5,
                                         'max_num_instances': 100,
                                         'num_channels': 3,
                                         'skip_crowd_during_training': True,
                                         'unmatched_threshold': 0.5},
                              'prefetch_buffer_size': None,
                              'seed': None,
                              'sharding': True,
                              'shuffle_buffer_size': 10000,
                              'tf_data_service_address': None,
                              'tf_data_service_job_name': None,
                              'tfds_as_supervised': False,
                              'tfds_data_dir': '',
                              'tfds_name': '',
                              'tfds_skip_decoding_feature': '',
                              'tfds_split': '',
                              'trainer_id': None,
                              'weights': None}},
 'trainer': {'allow_tpu_summary': False,
             'best_checkpoint_eval_metric': '',
             'best_checkpoint_export_subdir': 'best',
             'best_checkpoint_metric_comp': 'higher',
             'checkpoint_interval': 9744,
             'continuous_eval_timeout': 3600,
             'eval_tf_function': True,
             'eval_tf_while_loop': False,
             'loss_upper_bound': 1000000.0,
             'max_to_keep': 5,
             'optimizer_config': {'ema': None,
                                  'learning_rate': {'stepwise': {'boundaries': [555408,
                                                                                652848],
                                                                 'name': 'PiecewiseConstantDecay',
                                                                 'offset': 0,
                                                                 'values': [0.0025,
                                                                            0.00025,
                                                                            2.5e-05]},
                                                    'type': 'stepwise'},
                                  'optimizer': {'sgd': {'clipnorm': None,
                                                        'clipvalue': None,
                                                        'decay': 0.0,
                                                        'global_clipnorm': None,
                                                        'momentum': 0.9,
                                                        'name': 'SGD',
                                                        'nesterov': False},
                                                'type': 'sgd'},
                                  'warmup': {'linear': {'name': 'linear',
                                                        'warmup_learning_rate': 0.0067,
                                                        'warmup_steps': 500},
                                             'type': 'linear'}},
             'preemption_on_demand_checkpoint': True,
             'recovery_begin_steps': 0,
             'recovery_max_trials': 0,
             'steps_per_loop': 9744,
             'summary_interval': 9744,
             'train_steps': 701568,
             'train_tf_function': True,
             'train_tf_while_loop': True,
             'validation_interval': 9744,
             'validation_steps': -1,
             'validation_summary_subdir': 'validation'}}

error

I0316 12:00:10.691335 139750881177728 controller.py:502] train | step:  253344 | steps/sec:    3.8 | output: 
    {'box_loss': 0.0017456077,
     'cls_loss': 6.524426e-06,
     'learning_rate': 0.0025,
     'model_loss': 0.08728693,
     'total_loss': 0.2591449,
     'training_loss': 0.2591449}
train | step:  253344 | steps/sec:    3.8 | output: 
    {'box_loss': 0.0017456077,
     'cls_loss': 6.524426e-06,
     'learning_rate': 0.0025,
     'model_loss': 0.08728693,
     'total_loss': 0.2591449,
     'training_loss': 0.2591449}
I0316 12:00:11.730173 139750881177728 controller.py:531] saved checkpoint to /mnt/sda1/exc_cctv/results_retinanet/ckpt-253344.
saved checkpoint to /mnt/sda1/exc_cctv/results_retinanet/ckpt-253344.
I0316 12:00:11.730983 139750881177728 controller.py:297]  eval | step:  253344 | running complete evaluation...
 eval | step:  253344 | running complete evaluation...
INFO:tensorflow:Error reported to Coordinator: Exception encountered when calling layer 'retina_net_model' (type RetinaNetModel).

in user code:

    File "/home/vision/Models/models/official/vision/modeling/retinanet_model.py", line 169, in call  *
        final_results = self.detection_generator(raw_boxes, raw_scores,
    File "/home/vision/Models/models/official/vision/modeling/layers/detection_generator.py", line 1512, in __call__  *
        (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = (
    File "/home/vision/Models/models/official/vision/modeling/layers/detection_generator.py", line 588, in _generate_detections_v2  *
        return _generate_detections_v2_class_aware(
    File "/home/vision/Models/models/official/vision/modeling/layers/detection_generator.py", line 518, in _generate_detections_v2_class_aware  *
        nmsed_boxes = tf.concat(nmsed_boxes, axis=1)

    ValueError: List argument 'values' to 'ConcatV2' Op with length 0 shorter than minimum length 2.

Call arguments received by layer 'retina_net_model' (type RetinaNetModel):
  • images=tf.Tensor(shape=(1, 1024, 1024, 3), dtype=float32)
  • image_shape=tf.Tensor(shape=(1, 2), dtype=float32)
  • anchor_boxes={'3': 'tf.Tensor(shape=(1, 128, 128, 36), dtype=float32)', '4': 'tf.Tensor(shape=(1, 64, 64, 36), dtype=float32)', '5': 'tf.Tensor(shape=(1, 32, 32, 36), dtype=float32)', '6': 'tf.Tensor(shape=(1, 16, 16, 36), dtype=float32)', '7': 'tf.Tensor(shape=(1, 8, 8, 36), dtype=float32)'}
  • output_intermediate_features=False
  • training=False
Traceback (most recent call last):
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
    yield
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 386, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/tmp/__autograph_generated_filebc32f0c3.py", line 17, in step_fn
    logs = ag__.converted_call(ag__.ld(self).task.validation_step, (ag__.ld(inputs),), dict(model=ag__.ld(self).model, metrics=ag__.ld(self).validation_metrics), fscope_1)
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filefxbwta_h.py", line 12, in tf__validation_step
    outputs = ag__.converted_call(ag__.ld(model), (ag__.ld(features),), dict(anchor_boxes=ag__.ld(labels)['anchor_boxes'], image_shape=ag__.ld(labels)['image_info'][:, 1, :], training=False), fscope)
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 331, in converted_call
    return _call_unconverted(f, args, kwargs, options, False)
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 458, in _call_unconverted
    return f(*args, **kwargs)
  File "/home/vision/anaconda3/envs/tfm/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileiwd58oi2.py", line 228, in tf__call
    ag__.if_stmt(ag__.ld(training), if_body_11, else_body_11, get_state_12, set_state_12, ('do_return', 'final_results', 'retval_', 'anchor_boxes'), 3)
  File "/tmp/__autograph_generated_fileiwd58oi2.py", line 156, in else_body_11
    final_results = ag__.converted_call(ag__.ld(self).detection_generator, (ag__.ld(raw_boxes), ag__.ld(raw_scores), ag__.ld(anchor_boxes), ag__.ld(image_shape), ag__.ld(raw_attributes)), None, fscope)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 213, in tf____call__
    ag__.if_stmt(ag__.and_((lambda : ag__.ld(self)._config_dict['apply_nms']), (lambda : (ag__.ld(self)._config_dict['nms_version'] == 'tflite'))), if_body_8, else_body_8, get_state_8, set_state_8, ('do_return', 'retval_'), 2)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 200, in else_body_8
    ag__.if_stmt(ag__.not_(ag__.ld(self)._config_dict['apply_nms']), if_body_7, else_body_7, get_state_7, set_state_7, ('do_return', 'retval_'), 2)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 185, in else_body_7
    ag__.if_stmt((ag__.ld(self)._config_dict['nms_version'] == 'batched'), if_body_6, else_body_6, get_state_6, set_state_6, ('nmsed_attributes', 'nmsed_boxes', 'nmsed_classes', 'nmsed_scores', 'valid_detections'), 5)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 179, in else_body_6
    ag__.if_stmt((ag__.ld(self)._config_dict['nms_version'] == 'v1'), if_body_5, else_body_5, get_state_5, set_state_5, ('nmsed_attributes', 'nmsed_boxes', 'nmsed_classes', 'nmsed_scores', 'valid_detections'), 5)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 173, in else_body_5
    ag__.if_stmt((ag__.ld(self)._config_dict['nms_version'] == 'v2'), if_body_4, else_body_4, get_state_4, set_state_4, ('nmsed_attributes', 'nmsed_boxes', 'nmsed_classes', 'nmsed_scores', 'valid_detections'), 5)
  File "/tmp/__autograph_generated_filepm9g43ot.py", line 141, in if_body_4
    (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ag__.converted_call(ag__.ld(_generate_detections_v2), (ag__.ld(boxes), ag__.ld(scores)), dict(pre_nms_top_k=ag__.ld(self)._config_dict['pre_nms_top_k'], pre_nms_score_threshold=ag__.ld(self)._config_dict['pre_nms_score_threshold'], nms_iou_threshold=ag__.ld(self)._config_dict['nms_iou_threshold'], max_num_detections=ag__.ld(self)._config_dict['max_num_detections'], use_class_agnostic_nms=ag__.ld(self)._config_dict['use_class_agnostic_nms']), fscope)
  File "/tmp/__autograph_generated_file0tsux9fz.py", line 36, in tf___generate_detections_v2
    ag__.if_stmt(ag__.ld(use_class_agnostic_nms), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
  File "/tmp/__autograph_generated_file0tsux9fz.py", line 32, in else_body
    retval_ = ag__.converted_call(ag__.ld(_generate_detections_v2_class_aware), (), dict(boxes=ag__.ld(boxes), scores=ag__.ld(scores), pre_nms_top_k=ag__.ld(pre_nms_top_k), pre_nms_score_threshold=ag__.ld(pre_nms_score_threshold), nms_iou_threshold=ag__.ld(nms_iou_threshold), max_num_detections=ag__.ld(max_num_detections)), fscope)
  File "/tmp/__autograph_generated_file76u_22bw.py", line 60, in tf___generate_detections_v2_class_aware
    nmsed_boxes = ag__.converted_call(ag__.ld(tf).concat, (ag__.ld(nmsed_boxes),), dict(axis=1), fscope)
ValueError: Exception encountered when calling layer 'retina_net_model' (type RetinaNetModel).
JunHyungKang commented 1 year ago

@laxmareddyp

same issue with colab env.

notebook: https://colab.research.google.com/drive/1hzqvTr_rv9w0nM6jGniCWe05Les9jbKp?usp=sharing

laxmareddyp commented 1 year ago

Hi @JunHyungKang,

Thanks for providing the colab code. We suggest you to once go through this tutorial how to configure a object detection pipeline. I see that you have included a function in retinanet.py to register it. Instead of that you can load the experiment configuration using this line exp_config = exp_factory.get_exp_config('retinanet_resnetfpn_coco'). Which will load the configuration required for retinanet_resnetfpn_coco.

Now you can access the experiment configuration like an object and change all required variables in the configuration and use it for training your model with custom dataset and custom configuration. Please use distribution_strategy.scope() so that it will take care of the distribution strategy.

Please check this gist which gives a glimpse how you can modify the configuration and make use of it to fuller extent.

There may be new commits in git clone which are not yet in stable release, we suggest you to you use pip install so that if there are any on going errors/bugs, they will not be any problems to your training. I hope this will help you resolve the issue.

Thanks

JunHyungKang commented 1 year ago

Hi @laxmareddyp ,

Thank you for your response. However, the notebook file you provided is no different from the train.py file that I worked on and ran immediately. I am using the stable commit version of the master branch, and all the items you executed in the notebook are coded to run in the same order in train.py.

I have attached the results from running the notebook you provided in the same environment for reference. In my opinion, as I mentioned earlier, the NMS code may need to be modified to consider cases where there are no predicted bounding boxes, but I haven't been able to look deeper into it. Please check the attached results.

https://colab.research.google.com/drive/1XJma_dqu4RgWk-dODd03sm5n6WlToxii?usp=sharing

laxmareddyp commented 1 year ago

Hi @JunHyungKang ,

Yes, its no different from train.py just wanted to write without creating a function. Now I understood the problem, basically NMS is getting null boxes after prediction which has to be min of two length for concat, we will look into it internal and come back with proper resolution. Really thanks for reporting the bug.

Thanks.

laxmareddyp commented 1 year ago

Hi @JunHyungKang,

Can you please tell me what is the class number that has been declared in the tfrecords. Also if possible some tfrecords which are dummy and similar to your data would really help us to reproduce the error from our side. I was trying to reproduce with other dataset but not able to do it.

Thanks

JunHyungKang commented 1 year ago

@laxmareddyp I used one class to generate tfrecords. please refer to this dummy.

laxmareddyp commented 1 year ago

Hi @JunHyungKang,

Please make sure to follow below requirements while creating tfrecords.

'image/encoded' bytes Required. The encoded image bytes.
'image/source_id' string Required. The unique identifier of the image, need to be an number in string.
'image/height' integer Optional. The height of the image. If not exisit, inferred from image.
'image/width' integer Optional. The width of the image. If not exisit, inferred from image.
'image/object/bbox/xmin' a list of float Required. The normalized xmin coordinates of all the instances.
'image/object/bbox/xmax' a list of float Required. The normalized xmax coordinates of all the instances.
'image/object/bbox/ymin' a list of float Required. The normalized ymin coordinates of all the instances.
'image/object/bbox/ymax' a list of float Required. The normalized ymax coordinates of all the instances.
'image/object/class/label' a list of integer Required. The class indices of all the instances. Note that 0 is reserved for background.
'image/object/mask' a list of bytes Optional. The mask of all the instances in PNG format.
'image/object/area' a list of float Optional. The area of all the instances. If not exisit, derived from the bounding boxes.
'image/object/is_crowd' a list of integer Optional. The 0/1 integers to denote whether instances are a crowd. The crowd instance get special treatement during the evaluation. 0 (not crowd) by default.

The dummy records you provided has a error while evaluation because the source_id has string data with alphbatic characters in it. Please find the gist which I am trying to debug.

Also add the classes as 2 because the classes start from 0 in general in the cooc_json. Which can be considered as background if I am not wrong. Please check the below screenshot from BCCD dataset.

Screenshot 2023-03-27 at 10 30 47 AM

Please go through this object detection tutorial if you have not gone through, it uses BCCD dataset and trains the model. Still if the error persists, once you are ready with proper tfrecords format, we are happy to debug and further help you to resolve the issue.

Thanks

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No