tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

efficientnet top1 acc 15%+ down after post-training quantization #549

Open imcaspar opened 4 years ago

imcaspar commented 4 years ago

using checkpoint: https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz

export_model.py setting: python export_model.py --ckpt_dir=efficientnet-b0 --data_dir={DATA_DIR} --model_name=efficientnet-b0 --output_tflite=efficientnet-b0.tflite

imcaspar commented 4 years ago

@mingxingtan Could you help to take a look at this issue?

mingxingtan commented 4 years ago

Hi Caspar, could you post your full command lines on how to reproduce this issue?

Also, would be great if you can try the original checkpoint without autoaug: https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckpts/efficientnet-b0.tar.gz

saberkun commented 4 years ago

Hi Mingxing and @imcaspar, It might be an infra problem of TFLite. Some people tried efficientnet-b1 and get decent results.

imcaspar commented 4 years ago

@mingxingtan I am using ILSVRC validation set to test top1 acc checked the original checkpoint without autoaug but no luck... According the result of official tool of tflite https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/accuracy/ilsvrc all resulted based on 50k imgs (ILSVRC 2012 validation set)

efficientnet-b0 ( converted to .tflite without post-training quantization ): about 5% decent from the original checkpoint

Top 1, Top 2, Top 3, Top 4, Top 5 71.462, 82.100, 86.280, 88.634, 90.242

To compare: MnasNet_1.0_224 https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_224_09_07_2018.tgz :

Top 1, Top 2, Top 3, Top 4, Top 5 74.656, 84.668, 88.540, 90.540, 91.834

I didnt run the acc test on the whole 50k imgs val set on quanted models, they are too slow on desktop environment ( compare to float models ) based on the 1k test, efficientnet-b0 post-training quantization has 15%+ down on top1 acc, easily.

@saberkun FYI

saberkun commented 4 years ago

The input format is different from efficientnet-b0 and MnasNet_1.0_224 as their input processing are different? The float number drop is not expected.

imcaspar commented 4 years ago

@saberkun you mean the input processing difference between two model may caused the decent results? seem like the only difference is the image resize method, https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/preprocessing.py#L127 https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/preprocessing.py#L126

saberkun commented 4 years ago

MnasNet_1.0_224 is not exported from https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet

mingxingtan commented 4 years ago

This is very strange. Looks like some issues are introduced in the new tflite converter.

The old MnasNet_1.0_224 was converted long time ago (even before we open source mnasnet code pointed out by @saberkun )

Albert-Zhao-2020 commented 4 years ago

@saberkun @mingxingtan @imcaspar HI, has this problem been solved ? I have encountered the same problem with efficientnet-b1.The commond is: python export_model.py --model_name=$MODEL --ckpt_dir=ckpt/$MODEL --data_dir=imagenet2012/tf_records/validation --output_tflite=${MODEL}_lite_quant.tflite, which $MODEL=efficientnet-b1, the checkpoint is baseline at https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckpts/efficientnet-b0.tar.gz. the acc result with official tool of tflite https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/accuracy/ilsvrc, droped a lot(20%), so what is the problem??

mingxingtan commented 4 years ago

Could you try efficieintnet-lite1? My collegues have helped verified that efficientnet-litex models have very small accuracy drop after quantization.

Albert-Zhao-2020 commented 4 years ago

@mingxingtan I have tried that the accuracy of efficieintnet-litex is ok, but efficieintnet-bx drop a lot. is there anything wrong with tflite converter in my test?

shashichilappagari commented 4 years ago

@Albert-Zhao-2020 The reason could be this: https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html

dreamibor commented 4 years ago

@imcaspar @Albert-Zhao-2020 Hi, what was the model_output_labels file that you used for the ILSVRC 2012 evaluation tool? I couldn't find any realated sources about the labels.

dreamibor commented 4 years ago

Hi @Albert-Zhao-2020 Did you manage to solve the problem for EfficientNet-Bx?