SSD MobileNet v2 batch size 1 only

apivovarov commented 2 years ago

I'm trying to use SSD MobileNet v2 320x320

I used exporter_main_v2.py to get saved model. The generated saved model has hardcoded batch size 1. Is it the Limitation of SSD models? Is it possible to save the model with different batch sizes except of 1?

The command I'm running:

# From tensorflow/models/research
export OUTPUT_DIR=./output/ssd_mobilenet_v2_320x320_coco17_tpu-8
python object_detection/exporter_main_v2.py \
    --input_type=image_tensor \
    --pipeline_config_path=ssd_mobilenet_v2_320x320_coco17_tpu-8/pipeline.config \
    --trained_checkpoint_dir=ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint \
    --output_directory=$OUTPUT_DIR

apivovarov commented 2 years ago

I found that using --input_type=float_image_tensor instead of image_tensor changes inference graph batch size from 1 to -1.

class DetectionFromFloatImageModule shape=[None, None, None, 3] https://github.com/tensorflow/models/blob/master/research/object_detection/exporter_lib_v2.py#L186

class DetectionFromImageModule shape=[1, None, None, 3] https://github.com/tensorflow/models/blob/master/research/object_detection/exporter_lib_v2.py#L153

As you can see DetectionFromFloatImageModule (float32 input) uses dynamic batch size, but DetectionFromImageModule (UINT8 input) uses batch size 1.

Why UINT8 input needs batch size 1?

carmirandab commented 2 years ago

@apivovarov I also thought using image_tensor input type was the right thing to do (because it is the default), but from what I see, the main difference between image_tensor and float_image_tensor is that side inputs are discarded with float_image_tensor. Besides that, everything seems to be the same (the tf.uint8 tensor is casted to tf.float32 immediately after feeding it to the model, see L102 and L162). So, for this use case, you can use float_image_tensor as input type in order to export a dynamic batch size model, and just cast your input from uint8 to float32 before feeding it to the model. I just got this working on SSDMobileNetV2 and EfficientDet-D0, and there is no observable extra GPU memory usage.

You can also try implementing a DetectionFromUIntImageModule class to change the TensorSpec like so :

class DetectionFromUIntImageModule(DetectionInferenceModule):
  """Detection Inference Module for float image inputs."""

  @tf.function(
      input_signature=[
          tf.TensorSpec(shape=[None, None, None, 3], dtype=tf.uint8)])
  def __call__(self, input_tensor):
    images, true_shapes = self._preprocess_input(input_tensor, lambda x: x)
    return self._run_inference_on_images(images,
                                         true_shapes)

(updated from DetectionFromFloatImageModule on L181) adding the class to the DETECTION_MODULE_MAP at the end of the exporter_lib_v2.py file and see if that works.

As for why a batch size of one is "needed" for the image_tensor input, I believe it's due to time or usage... There must not be enough use cases for this to be done. One would have to handle dynamic batches of side-inputs as well, and that would certainly mean duplicating input "steps" for models like Context-RCNN.

apivovarov commented 2 years ago

Thank you for your explanation. I think I can just use float_image_tensor. Do you know if it is possible to export inference graph for a fixed input shape with batch size greater than one? e.g. for input shape (8,300,300,3)? I need to get pbtxt frozen graph where all dimensions for all ops are clearly defined. The graph will be used outside of Tensorflow.

apivovarov commented 2 years ago

I tried to use fixed batch size 2 inside DetectionFromFloatImageModule tf.TensorSpec - the export works, but model run works only for input with batch size 1. If I try to run model with batch size 2 input it fails:

>>> m = tf.saved_model.load("saved_model")
>>> x=tf.random.uniform((2,300,300,3), dtype=tf.dtypes.float32)
>>> m(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 664, in _call_attribute
    return instance.__call__(*args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3039, in __call__
    return graph_function._call_flat(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [2,100] vs. [2]
     [[{{node StatefulPartitionedCall/Postprocessor/CombinedNonMaxSuppression/Maximum}}]] [Op:__inference_restored_function_body_47653]

Function call stack:
restored_function_body

robertorovella91 commented 2 years ago

Actually I noticed that the batch size is fixed at 1 with all pre-trained models. Is there any news regarding this? Thanks in advance

tensorflow / models

SSD MobileNet v2 batch size 1 only #10317