object detection api toco model conversion problem

FreestylePocker commented 6 years ago

System information

What is the top-level directory of the model you are using: research/object_detection
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.10.0
Bazel version (if compiling from source): 0.16.1
CUDA/cuDNN version: 9.2/7.2
GPU model and memory: GeForce GTX 970 4G

Exact command to reproduce:

bazel run --config=opt tensorflow/contrib/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops

Describe the problem

i am trying to create a fully quantized tflite model for inference

while trained this model from scratch with custom dataset there was a problem related to https://github.com/tensorflow/models/issues/5139 but i used a workaround to increase eval delay and restarted process few times so this problem was just slowed down training process

finnaly model was trained and works fine with .pb file created by export_inference_graph.py

to create tflite file i followed this instructions https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md

object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$OUTPUT_DIR \
--add_postprocessing_op=true

exported tflite_graph.pb without a problem

but when converting it to the tflite with toco it crashes:

bazel run --config=opt tensorflow/contrib/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops

results in tensorflow/contrib/lite/toco/graph_transformations/propagate_fixed_sizes.cc:116] Check failed: dim_x == dim_y (256 vs. 24)Dimensions must match

Source code / logs

toco log:

2018-09-13 05:07:46.154129: I tensorflow/contrib/lite/toco/import_tensorflow.cc:1055] Converting unsupported operation: TFLite_Detection_PostProcess                                                                                                                                      
2018-09-13 05:07:46.361651: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 1992 operators, 2969 arrays (0 quantized)                                                                                                       
2018-09-13 05:07:46.429271: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 1992 operators, 2969 arrays (0 quantized)                                                                                             
2018-09-13 05:07:50.926104: F tensorflow/contrib/lite/toco/graph_transformations/propagate_fixed_sizes.cc:116] Check failed: dim_x == dim_y (256 vs. 24)Dimensions must match                                                                                                             
Emergency stop (memory stack is flushed to disk)

export_tflite_ssd_graph.py log:

2018-09-13 05:03:51.397799: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                               
2018-09-13 05:03:51.398192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:                                                                                                                                                                      
name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.367                                                                                                                                                                                                                       
pciBusID: 0000:04:00.0                                                                                                                                                                                                                                                                    
totalMemory: 3.95GiB freeMemory: 3.88GiB                                                                                                                                                                                                                                                  
2018-09-13 05:03:51.398208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0                                                                                                                                                                        
2018-09-13 05:03:51.637059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:                                                                                                                                       
2018-09-13 05:03:51.637107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0                                                                                                                                                                                                
2018-09-13 05:03:51.637115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N                                                                                                                                                                                                
2018-09-13 05:03:51.637295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3607 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:04:00.0, compute capability: 5.2)   
2018-09-13 05:03:55.605093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0                                                                                                                                                                        
2018-09-13 05:03:55.605145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:                                                                                                                                       
2018-09-13 05:03:55.605162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0                                                                                                                                                                                                
2018-09-13 05:03:55.605168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N                                                                                                                                                                                                
2018-09-13 05:03:55.605283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3607 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:04:00.0, compute capability: 5.2)   
2018-09-13 05:03:57.218987: I tensorflow/tools/graph_transforms/transform_graph.cc:317] Applying strip_unused_nodes

config:

# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 2
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 2
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 256
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true,
            center: true,
            train: true,
            decay: 0.97,
            epsilon: 0.001,
          }
        }
        num_layers_before_predictor: 4
        kernel_size: 3
      }
    }
    feature_extractor {
      type: 'ssd_resnet50_v1_fpn'
      fpn {
        min_level: 3
        max_level: 7
      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 1
  num_steps: 400000
  data_augmentation_options {
    random_rgb_to_gray {
      probability: 0.75
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .005
          total_steps: 400000
          warmup_learning_rate: .0001
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  tf_record_input_reader {
    input_path: train.record
  }
  label_map_path: label_map.pbtxt
}

eval_config: {
  use_moving_averages: false
  num_examples: 213
  metrics_set: "coco_detection_metrics"
  eval_interval_secs: 300
  max_evals: 100
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: eval.record
  }
  label_map_path: label_map.pbtxt
  shuffle: false
  num_readers: 1
}

graph_rewriter {
  quantization {
    delay: 0
    activation_bits: 8
    weight_bits: 8
  }
}

karmel commented 6 years ago

@gargn -- can you take a look at the toco conversion error detailed above?

gargn commented 6 years ago

Adding @achowdhery who works on the Object Detection model.

achowdhery commented 6 years ago

FPN model support is still pending on our end. We have noted this feature request and will keep you updated on adding support for it next 4 weeks.

jackweiwang commented 6 years ago

Hi, @achowdhery Has the problem been solved?

SiddhantKapil commented 6 years ago

Hi, @achowdhery do fpn supports toco conversion now?

Vandmoon commented 6 years ago

Hi, @achowdhery! Does toco conversion support FPN now? Actually I am more concerning about the quantization of upsampling operation. Everytime I tried to quantize nearest_neighbor_upsampling, it delivered an error that mul is lacking min/max data.

maxcrous commented 5 years ago

Any update on FPN model support @achowdhery?

When I train ssd_mobilenet_v1_fpn

I reach a low loss.
I am able to export_tflite_ssd_graph succesfully
I am able to tflite_convert succesfully

But, when invoking the tfilite model on mobile or in python, I receive a Fatal signal 6 (SIGABRT).

The same happens when using the frozen_inference_graph supplied with the download from the model zoo.

Everything works when I use ssd_mobilenet_v1_coco instead of ssd_mobilenet_v1_fpn.

achowdhery commented 5 years ago

@maxcrous If you are able to visualize the TF Lite after exporting, that would be extremely helpful in understanding and debugging the problem in open source version. Please share the visualization (TF Lite file can be visualized in Netron app) or you can use this tool (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py)

maxcrous commented 5 years ago

@achowdhery, thank you for the quick reply. Yes, I am able to visualize it with Netron. Visually, it is very similar to ssd_mobilenet_v1_coco.

Link to Netron image for ssd_mobilenet_v1_coco: https://ibb.co/Vxd2Xqf

Link to Netron image for ssd_mobilenet_v1_fpn: https://ibb.co/JQPDRwj

maxcrous commented 5 years ago

This is my system information:

OS Platform and Distribution: MacOS Mojave 10.14.1 TensorFlow installed from (source or binary): binary TensorFlow version (use command below): 1.12.0 All processes run on CPU.

achowdhery commented 5 years ago

In the export script, Can you please try turning off the addition of postprocessing op in the FPN model to see if the SIGABRT is in the main graph or postprocessing op?

maxcrous commented 5 years ago

When setting add_postprocessing_op to False, export_tflite_ssd_graph.py succeeds. This is on the model.ckpt and pipeline.config supplied with the ssd_mobilenet_v1_fpn from the model zoo.

The model is then successfully converted to a tflite model with the following command.

tflite_convert \
--graph_def_file=tflite_graph.pb \
--output_file=detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='concat_1' \
--inference_type=FLOAT \
--mean_values=128 \
--std_dev_values=128 \
--allow_custom_ops

The resulting tflite model still produces a Fatal signal 6 (SIGABRT). The code I use to test the model can be found here: https://bit.ly/2HqRPMi

The Netron image for the tflite model can be found here: https://ibb.co/5LRwpV6

maxcrous commented 5 years ago

When using tensorflow 1.11.0 (same result as tensorflow 1.12.0)

export_tflite_ssd_graph is succesful
tflite_convert is succesful
SIGABRT on invoke

As stated in issue #4826, when using tensorflow 1.10.0

export_tflite_ssd_graph is succesful
tflite_convert throws the error :

RuntimeError: TOCO failed see console for info.
b'2019-01-22 16:34:31.145095: F tensorflow/contrib/lite/toco/import_tensorflow.cc:218] Check failed: input_shape.dim_size() <= 4 (6 vs. 4)\n'
None

achowdhery commented 5 years ago

Yes, it was probably not possible to convert this until v1.12. I am still trying to understand why there is SIGABT if it converts. Please attach or email the frozen graph and tflite file. The converted tflite file should run.

maxcrous commented 5 years ago

Here is the model straight out of the model zoo after export_tflite_ssd_graph and tflite_convert, run with the arguments mentioned in previous posts. https://drive.google.com/file/d/1TLGi9KqpYdAp86sc01c1GNBSA1C2a1VP/view?usp=sharing

Thanks again for the helpfulness.

achowdhery commented 5 years ago

Please also add the tflite_convert command you used. I will try to repro the bug. And please add the stack trace for SIGABRT.

maxcrous commented 5 years ago

For export_tflite_ssd_graph I use:

export mobilenet_fpn=ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03

python3 object_detection/export_tflite_ssd_graph.py    \
--pipeline_config_path /..../$mobilenet_fpn/pipeline.config    \
--trained_checkpoint_prefix /..../$mobilenet_fpn/model.ckpt    \
--output_directory /..../$mobilenet_fpn/output \
--add_postprocessing_op=False

Do note that the the postprocessing operations have been disregarded.

Then I cd into the previous command's output directory, and for tflite_convert I use:

tflite_convert \
--graph_def_file=tflite_graph.pb \
--output_file=detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='concat_1' \
--inference_type=FLOAT \
--mean_values=128 \
--std_dev_values=128 \
--allow_custom_ops

The stack trace for the SIGABRT can be found here: https://drive.google.com/open?id=1RdE5861tXiWBF3lxaGgwfWss3lJK-W_3

achowdhery commented 5 years ago

The bug seems to be with Mul op. We will look in to this in the next few days. We sincerely appreciate your reporting the same.

hxtkyne commented 5 years ago

do you solve the problem?@maxcrous

maxcrous commented 5 years ago

Hey @hxtkyne, I don't have any knowledge of the tflite conversion process, so we will have to wait for the good people at Tensorflow to fix this one. In the meantime I'm using the ssd_mobilenet_v1_coco for mobile deployment, which works ok for my problem.

oopsodd commented 5 years ago

I used TF object detection API to train ssd_resnet_50_fpn_coco with a 50-classes dataset. Everything is okay with frozen model. The checkpoint was converted successfully using this command:

bazel run -c opt tensorflow/lite/toco:toco -- \
  --input_file=$OUTPUT_DIR/tflite_graph.pb \
  --output_file=$OUTPUT_DIR/detect.tflite \
  --input_shapes=1,640,640,3 \
  --input_arrays=normalized_input_image_tensor \
  --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'  \
  --inference_type=FLOAT \
  --mean_values=128 \
  --std_values=128 \
  --change_concat_input_ranges=false \
  --allow_custom_ops
(ubuntu 16, latest tensorflow, models repo, tf-nightly)

But the tflite model detect wrong class, bbox. All the output classes are the same (1 class). The tflite model takes 3s per image to inference on Galaxy S9 (same as the frozen model). Did Tflite support ssd_resnet_50_fpn_coco?

joeyM1997 commented 5 years ago

Ran into this yesterday. Anybody know the progress on this?

holyhao commented 5 years ago

@oopsodd used the weight_shared_convolutional_box_predictor in ppnnet，the tflite model detect wrong class, bbox. All the output classes are the same (1 class) too. I wonder if the convert tool support weight_shared_convolutional_box_predictor well?

sarmadidrees commented 5 years ago

@achowdhery any update on the FPN model support?

AliceDinh commented 5 years ago

still waiting good news on FPN model support, anyone gets any update?

dkashkin commented 5 years ago

Please make it a priority to add FPN support! Everybody needs this.

yjfncu commented 5 years ago

@oopsodd I also use TF object detection API to train ssd-mobilenet_v2 model and use export_tflite_ssd_graph.py convert ckpt model to .pb file.and the .pb file also works well, but when I use bazel run --config=opt tensorflow/lite to convert .pb to .tflite, there is some errors, if I need compile tensorflow soure use bazel tools and then can use this command to convert .pb file to tflite, and how do you compile tensorflow use bazel ,thank you

thusinh1969 commented 5 years ago

Any update on FPN model support @achowdhery?

When I train ssd_mobilenet_v1_fpn

I reach a low loss.

I am able to export_tflite_ssd_graph succesfully

I am able to tflite_convert succesfully

But, when invoking the tfilite model on mobile or in python, I receive a Fatal signal 6 (SIGABRT).

The same happens when using the frozen_inference_graph supplied with the download from the model zoo.

Everything works when I use ssd_mobilenet_v1_coco instead of ssd_mobilenet_v1_fpn.

I have exactly the same problem with ssd fpn mobile. Same setup, same command line. Use Android Studio, keep crashing same SIGABRT error. While ssd_mobile_v2 and ssd_inception_v2 both work fine in both FLOAT and QUANTIZED_UINT8 mode.