tensorflow / models

Models and examples built with TensorFlow
Other
76.97k stars 45.79k forks source link

Fine tuning generates incorrect variable values for quantized ssd mobilenet #7008

Closed sdamani-intel closed 4 years ago

sdamani-intel commented 5 years ago

System information

What is the top-level directory of the model you are using: object_detection Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 TensorFlow installed from (source or binary): source TensorFlow version (use command below): 1.13.1 Bazel version (if compiling from source): 0.24.1 CUDA/cuDNN version: N/A GPU model and memory: N/A

Repro instructions

Issue 1: Fine-tuning from checkpoint causes fp32 inference to fail

  1. Download ssd_mobilenet_v1_quantized_coco from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md and the 2017 coco data set from http://cocodataset.org/#download
  2. Train from checkpoint (downloaded above) with 10 additional steps using command: python object_detection/model_main.py --pipeline_config_path=object_detection/samples/configs/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --model_dir=data/ --alsologtostderr --quantize
  3. Freeze values for fp32 graph using command: python export_inference_graph.py --input_type image_tensor --pipeline_config_path=samples/configs/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix=../data/model.ckpt --output_directory=../data
  4. Run inference on fp32 graph. Accuracy is now 0%.

Issue 2: Pre-trained checkpoint gives incorrect min/max values for FakeQuantization resulting in poor int8 inference

  1. Download ssd_mobilenet_v1_quantized_coco from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
  2. Freeze values for fp32 graph using command: python export_inference_graph.py --input_type image_tensor --pipeline_config_path=samples/configs/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix=../data/model.ckpt --output_directory=../data
  3. Quantize graph to generate int8 graph
  4. Inference on fp32 graph gives expected accuracy but on int8 gives very poor accuracy. For inference I'm using a prebuild record file because using the downloaded coco dataset to generate a new record file results in errors during inference (all images skipped during inference).

Setup:

Debugging notes The issue is almost certainly the fact that FakeQuantization min/max values in this checkpoint are incorrect. In particular, after fine-tuning (as in issue 1), we get min/max values of (0,6) as expected because the input comes from Relu6. On the other hand, min/max values without training (i.e. using the downloaded checkpoint) for the same node are (-25,32), which are certainly incorrect.

So, there are two problems here:

  1. The pre-trained model checkpoint has incorrect min/max values
  2. Fine tuning the checkpoint results in incorrect weights

Please let me know if there is any additional information that is required.

tensorflowbutler commented 5 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. CUDA/cuDNN version GPU model and memory Exact command to reproduce

sdamani-intel commented 5 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. CUDA/cuDNN version GPU model and memory Exact command to reproduce

I set GPU model and CUDA version as N/A. I thought that was obvious seeing as I was executing this on the CPU. I have also provided extremely detailed repro instructions including commands.

sdamani-intel commented 5 years ago

I believe the issue may be with the dataset. I think that coco2017 may not be compatible with this model (the webpage mentions coco14 minival).

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.