danielspicar commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[ x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[ x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

https://github.com/tensorflow/models/blob/master/research/object_detection/exporter_lib_v2.py

2. Describe the bug

I am trying to export an EfficientDet D0 model trained with TF2 OD API from a trained checkpoint to SavedModel format.

I trained the model from the model zoo pretrained model on a custom dataset.

The export fails with the error:

TypeError: in user code:

    /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:162 call_func  *
        images, true_shapes = self._preprocess_input(input_tensor, lambda x: x)
    /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:106 _preprocess_input  *
        images, true_shapes = tf.map_fn(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574 new_func  **
        return func(*args, **kwargs)

    TypeError: map_fn_v2() got an unexpected keyword argument 'fn_output_signature'

3. Steps to reproduce

Train an efficientdet-d0 model: python ./object_detection/model_main_tf2.py --pipeline_config_path=/path/to/pipeline.config --model_dir=/path/to/checkpoints/ --alsologtostderr
Export command: python ./object_detection/exporter_main_v2.py --input_type image_tensor --pipeline_config_path /path/to/pipeline.config --trained_checkpoint_dir /path/to/checkpoints/ --output_directory /path/to/export/dir

4. Expected behavior

The export should not fail.

5. Additional context

I noticed that the location of the error was last modified by commit 0d6ce6025ffc2bed437160fc8b2e9934b3f82fad which appears to have added a new function that causes this error.

Full log:

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ic5s6yw7 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
I0202 23:31:15.607032 139970535884608 ssd_efficientnet_bifpn_feature_extractor.py:144] EfficientDet EfficientNet backbone version: efficientnet-b0
I0202 23:31:15.607209 139970535884608 ssd_efficientnet_bifpn_feature_extractor.py:145] EfficientDet BiFPN num filters: 64
I0202 23:31:15.607270 139970535884608 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num iterations: 3
I0202 23:31:15.619016 139970535884608 efficientnet_model.py:146] round_filter input=32 output=32
I0202 23:31:15.693667 139970535884608 efficientnet_model.py:146] round_filter input=32 output=32
I0202 23:31:15.693824 139970535884608 efficientnet_model.py:146] round_filter input=16 output=16
I0202 23:31:15.798533 139970535884608 efficientnet_model.py:146] round_filter input=16 output=16
I0202 23:31:15.798703 139970535884608 efficientnet_model.py:146] round_filter input=24 output=24
I0202 23:31:16.090714 139970535884608 efficientnet_model.py:146] round_filter input=24 output=24
I0202 23:31:16.090888 139970535884608 efficientnet_model.py:146] round_filter input=40 output=40
I0202 23:31:16.384582 139970535884608 efficientnet_model.py:146] round_filter input=40 output=40
I0202 23:31:16.384755 139970535884608 efficientnet_model.py:146] round_filter input=80 output=80
I0202 23:31:16.830133 139970535884608 efficientnet_model.py:146] round_filter input=80 output=80
I0202 23:31:16.830313 139970535884608 efficientnet_model.py:146] round_filter input=112 output=112
I0202 23:31:17.286846 139970535884608 efficientnet_model.py:146] round_filter input=112 output=112
I0202 23:31:17.287024 139970535884608 efficientnet_model.py:146] round_filter input=192 output=192
I0202 23:31:17.897233 139970535884608 efficientnet_model.py:146] round_filter input=192 output=192
I0202 23:31:17.897415 139970535884608 efficientnet_model.py:146] round_filter input=320 output=320
I0202 23:31:18.166997 139970535884608 efficientnet_model.py:146] round_filter input=1280 output=1280
I0202 23:31:18.230226 139970535884608 efficientnet_model.py:459] Building model efficientnet with params ModelConfig(width_coefficient=1.0, depth_coefficient=1.0, resolution=224, dropout_rate=0.2, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=40, output_filters=80, kernel_size=3, num_repeat=3, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=80, output_filters=112, kernel_size=5, num_repeat=3, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=112, output_filters=192, kernel_size=5, num_repeat=4, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=192, output_filters=320, kernel_size=3, num_repeat=1, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise')), stem_base_filters=32, top_base_filters=1280, activation='simple_swish', batch_norm='default', bn_momentum=0.99, bn_epsilon=0.001, weight_decay=5e-06, drop_connect_rate=0.2, depth_divisor=8, min_depth=None, use_se=True, input_channels=3, num_classes=1000, model_name='efficientnet', rescale_input=False, data_format='channels_last', dtype='float32')
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:106: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
W0202 23:31:22.734747 139970535884608 deprecation.py:573] From /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:106: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
Traceback (most recent call last):
  File "object_detection/exporter_main_v2.py", line 159, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "object_detection/exporter_main_v2.py", line 155, in main
    FLAGS.side_input_types, FLAGS.side_input_names)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py", line 279, in export_inference_graph    concrete_function = detection_module.__call__.get_concrete_function()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 959, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 865, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 506, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2667, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

    /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:162 call_func  *
        images, true_shapes = self._preprocess_input(input_tensor, lambda x: x)
    /usr/local/lib/python3.6/dist-packages/object_detection/exporter_lib_v2.py:106 _preprocess_input  *
        images, true_shapes = tf.map_fn(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574 new_func  **
        return func(*args, **kwargs)

    TypeError: map_fn_v2() got an unexpected keyword argument 'fn_output_signature'

pipeline.config

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 44
    add_background_class: false
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 3
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
        }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 64
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          force_use_bias: true
          activation: SWISH
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true
            decay: 0.99
            epsilon: 0.001
          }
        }
        num_layers_before_predictor: 3
        kernel_size: 3
        use_depthwise: true
      }
    }
    feature_extractor {
      type: 'ssd_efficientnet-b0_bifpn_keras'
      bifpn {
        min_level: 3
        max_level: 7
        num_iterations: 3
        num_filters: 64
      }
      conv_hyperparams {
        force_use_bias: true
        activation: SWISH
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.99,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 1.5
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.5
        max_detections_per_class: 8
        max_total_detections: 30
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/tf/ml-data/tf-efficientdet/efficientdet_d0_coco17_tpu-32/checkpoint/ckpt-0"
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint_type: "detection"
  batch_size: 16
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  use_bfloat16: false
  num_steps: 100000

  data_augmentation_options {
    random_rotation90 {
        probability: 0.25
    }
  }

  data_augmentation_options {
    random_rotation90 {
        probability: 0.25
    }
  }

  data_augmentation_options {
    random_rotation90 {
        probability: 0.25
    }
  }

  data_augmentation_options {
    random_scale_crop_and_pad_to_square {
      output_size: 512
      scale_min: 0.95
      scale_max: 1.05
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 8e-2
          total_steps: 100000
          warmup_learning_rate: .0001
          warmup_steps: 2500
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/tf/ml-data/documents/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/tf/ml-data/documents/train_512.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/tf/ml-data/documents/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/tf/ml-data/documents/eval_512.tfrecord"
  }
}

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): tensorflow/tensorflow:2.2.0-gpu Docker image running with nvidia-docker.
TensorFlow installed from (source or binary): Dockerfile
TensorFlow version (use command below): 2.2.0
Python version: 3.6
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: CUDA 10.2
GPU model and memory: NVIDIA V100 16GB

muhiddinov commented 3 years ago

Hello everybody! I had this problem fixed when I changed Tensorflow_GPU to version 2.4.1. Tensorflow_GPU version 2.2 was when I started this.

muhiddinov commented 3 years ago

python -m pip uninstall tensorflow-gpu==2.2 python -m pip uninstall tensorflow==2.4.1

python -m pip install tensorflow-gpu==2.4.1 python -m pip install numpy==1.17.1

130050029 commented 3 years ago

Hi, @danielspicar,,

I too faced this problem and @muhiddinov is correct with his solution.

I will add a bit more to explain what is happening behind the scenes.

TF has changed map_fn_v2() implementation in going from TF 2.2 to TF2.3. Posting below function definition from both files.

TF2.2 version https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/ops/map_fn.py

def map_fn_v2(fn, elems, dtype=None, parallel_iterations=None, back_prop=True, swap_memory=False, infer_shape=True, name=None):

TF2.3 version https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/python/ops/map_fn.py

def map_fn_v2(fn, elems, dtype=None, parallel_iterations=None, back_prop=True, swap_memory=False, infer_shape=True, name=None, fn_output_signature=None):

You can see added parameter in function definition in TF2.3 version, and it is also true for higher versions.

Thus, upgrading your TF to 2.3 or higher versions will remove this issue. However, do check this link - https://www.tensorflow.org/install/source#tested_build_configurations

Specifically, this part:-

Upgrading to TF2.4 will involve changing CUDA version too, which is frankly an additional hassle. Therefore, I would recommend upgrading to a suitable version of TF to avoid changing everything in your system.

Hopefully, I have solved the problem.

Thanks all.

Warmly, Ankit Rathore.

danielspicar commented 3 years ago

Thank you. I worked around the issue by building a custom Docker container with TF 2.2 but the source code from before commit 0d6ce6025ffc2bed437160fc8b2e9934b3f82fad

I will try TF 2.3 soon as upgrading CUDA for TF 2.4 is not possible for me. I do not have such priviledges on the GPU server I use (and I don't want to waste the admin's goodwill just for a test ;) ).

Anyway, it seems that the Dockerfile should be updated to use at least tf-gpu 2.3.1.

I will report back when I get to check if it works with tf 2.3. But perhaps somebody else can verify this.

mohammedayub44 commented 3 years ago

@danielspicar Ran into this issue in my environment as well. Upgrading TF to 2.3.1 worked ! It still throws a warning but export completes successfully.

aiXia121 commented 3 years ago

how about tensorflow:2.2.2 ? I only have mac's cpu，and i meet the same problem. Is there somebody could share something? thx

skbhat commented 3 years ago

Namaste,

I am using TF 2.2. Just replace the argument in exporter_main_v2.py . Instead of fn_output_signature=(tf.float32, tf.int32) use dtype=(tf.float32, tf.int32)

It worked for me.

tensorflow / models

exporter_main_v2.py error: TypeError: map_fn_v2() got an unexpected keyword argument 'fn_output_signature' #9700