tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

EfficientNet int8 quantization doesn't work? #669

Open tigert1998 opened 4 years ago

tigert1998 commented 4 years ago

I export efficientnet b0/b1 with --quantize=True under tensorflow=2.0:

python ./tpu/models/official/efficientnet/export_model.py `
    --model_name=efficientnet-b0 `
    --ckpt_dir=${some_path}/exported_models/efficientnet-b0 `
    --data_dir=${some_path}/dataset/imagenet/tf_records `
    --output_tflite=${some_path}/exported_models/efficientnet_b0_int_quant.tflite `
    --enable_ema=true `
    --image_size=224 `
    --quantize=true

python ./tpu/models/official/efficientnet/export_model.py `
    --model_name=efficientnet-b1 `
    --ckpt_dir=${some_path}/exported_models/efficientnet-b1 `
    --data_dir=${some_path}/dataset/imagenet/tf_records `
    --output_tflite=${some_path}/exported_models/efficientnet_b1_int_quant.tflite `
    --enable_ema=true `
    --image_size=240 `
    --quantize=true

Then I test its accuracy under https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/accuracy/ilsvrc (TensorFlow r2.1, commit id: 60afa4e. And I manually change the preprocessing stage to keep the same with efficientnet py repo).

It seems that both model evaluates the accuracy to about 0.

Please give a explaination.

sarahmass commented 4 years ago

You need to have created the labels correctly, I had similar trouble and used these for the two label files: I used for the model output labels: (https://github.com/tensorflow/tpu/files/4112581/labels_mobilenet_quant_v1_224.txt) And for the ground truth labels: validation_ground_truth.txt

How did you create your int8 quantization model? Did you us a calibration set? If so which one? And if this works for you what accuracy were you able to achieve?

Hope this helps, and thanks in advance for what ever help you can give me. :)

tigert1998 commented 4 years ago

@sarahmass Thank you so much for your information. I am now on my vacation so the new accuracy test would probably be delayed. But my question is: why do labels even matter in post-quantization? I can ensure that labels are 100% correct in the evaluation stage because the unquantized efficientnets achieve their claimed accuracy under the same settings.

I checked the code in https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/export_model.py:

def representative_dataset_gen():
  """Gets a python generator of image numpy arrays for ImageNet."""
  params = dict(batch_size=FLAGS.batch_size)
  imagenet_eval = imagenet_input.ImageNetInput(
      is_training=False,
      data_dir=FLAGS.data_dir,
      transpose_input=False,
      cache=False,
      image_size=FLAGS.image_size,
      num_parallel_calls=1,
      use_bfloat16=False,
      include_background_label=True,
  )

  data = imagenet_eval.input_fn(params)

  def preprocess_map_fn(images, labels):
    # neglected labels
    del labels
    model_builder = get_model_builder(FLAGS.model_name)
    images -= tf.constant(
        model_builder.MEAN_RGB, shape=[1, 1, 3], dtype=images.dtype)
    images /= tf.constant(
        model_builder.STDDEV_RGB, shape=[1, 1, 3], dtype=images.dtype)
    return images

  data = data.map(preprocess_map_fn)
  iterator = data.make_one_shot_iterator()
  for _ in range(FLAGS.num_steps):
    # In eager context, we can get a python generator from a dataset iterator.
    images = iterator.get_next()
    yield [images]

# <<<<<<<
# separation line
# >>>>>>>

    if FLAGS.quantize:
      if not FLAGS.data_dir:
        raise ValueError(
            "Post training quantization requires data_dir flag to point to the "
            "calibration dataset. To export a float model, set "
            "--quantize=False.")

      converter.representative_dataset = tf.lite.RepresentativeDataset(
          representative_dataset_gen)
      converter.optimizations = [tf.lite.Optimize.DEFAULT]
      converter.inference_input_type = tf.lite.constants.QUANTIZED_UINT8
      converter.inference_output_type = tf.lite.constants.QUANTIZED_UINT8
      converter.target_spec.supported_ops = [
          tf.lite.OpsSet.TFLITE_BUILTINS_INT8
      ]

Does it seem like labels are directly dropped?

For your additional questions, sure I used a calibration set and it's imagenet. Every int8 post-quantization in tflite should use a calibration set.

sarahmass commented 4 years ago

Then I test its accuracy under https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/accuracy/ilsvrc (TensorFlow r2.1, commit id: 60afa4e. And I manually change the preprocessing stage to keep the same with efficientnet py repo).

I was commenting on your accuracy test. If you run the eval script with incorrectly formatted ground truth labels and model_generated_labels, then you will end up with zero percent accuracy. This is what I experienced. I had generated an efficientnet-edgetpu-S by using a 500 image calibration set, and then ran the same evaluation script that you linked to above and got zero percent accuracy until I realized that both my label files were incorrectly formatted. It was mind boggling and frustrating to track down this issue. Now, I am able to achieve the benchmark accuracy with my tflite efficientnet-edgetpu-S_float model.

I am not able to match the benchmark accuracy for the downloaded(from the chkpt file) or my generated efficientnet-edgetpu-S_quant.tflite models. I was hoping you could catch up with me and let me know if you were able to meet the benchmark accuracies with your efficientnet model. :) But it looks like I have to wait until you get off vacation to work on debugging your issue further.

sarahmass commented 4 years ago

@mingxingtan, I was wondering if you could take a look at this issue. I used the efficientnet-edgetpu-S_float.tflite from the checkpoint download and achieved 77.4% which matches the benchmark in the edgetpu ReadMe. Unfortunately, when I run the efficientnet-edgetpu-S_quant.tflite in the same download I only achieve 56.4% accuracy which is more than 20% drop in accuracy from the benchmark. I also took the checkpoint and converted the efficientnet-edgetpu-S model into a quantized tflite model using the provided code (that @tiger1998 used) an 500 calibration images from the imagenet validation set and still was only able to achieve 67.7%. I was then wondering if the evaluation script needed a quantization flag but since there was none detailed I tried the mobilenet_v1_1.0_224_quant.tflite and was able to achieve the benchmark.

I am now curious as to what model you used to achieve the benchmarks and how many images you used in the post training script to achieve the benchmark accuracy of 77% for the efficientnet-edgetpu-S quantized model. I am adding my comments to tiger1998's issue because we have a similar issue.

sarahmass commented 4 years ago

Update: I just quantized a tflite efficientnet-b0 model from the provided checkpoint with the export_model.py script using 500 images as calibration and then used the script linked above to evaluate its accuracy and I get a 45.33% accuracy on the imagenet validation set. So the diminished results are not just occurring for the quantized edgetpu models. I had to make one change when running the evaluation script between the two sets of models. edgetpu ckpts are trained with a background and the base models were not. So, What am I missing? Why can't I reach the benchmarked accuracies?

BernardinD commented 4 years ago

I'm having a similar situation. I'm using the same preprocessing procedure and output procedure as the eval-driver for efficientnet but I'm not able to reproduce similar results after the tflite conversion

dreamibor commented 4 years ago

@sarahmass Hi, did you manage to solve the problem? Are you still using this https://github.com/tensorflow/tpu/files/4112581/labels_mobilenet_quant_v1_224.txt as the model_output_labels for https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/evaluation/tasks/imagenet_image_classification? I used this mobilenet lablels, and even the EfficientNet Lite0 FP32 model achieved 38% for Top-1 Accuracy, which is only half of the benchmarked numbers.

@mingxingtan Could you share the model_output_labels.txt file that you used for evaluating EfficinetNet Lite and EdgeTPU?

dreamibor commented 4 years ago

Hi @BernardinD , did you manage to solve the problem?

sarahmass commented 4 years ago

@sarahmass Hi, did you manage to solve the problem? Are you still using this https://github.com/tensorflow/tpu/files/4112581/labels_mobilenet_quant_v1_224.txt as the model_output_labels for https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/evaluation/tasks/imagenet_image_classification? I used this mobilenet lablels, and even the EfficientNet Lite0 FP32 model achieved 38% for Top-1 Accuracy, which is only half of the benchmarked numbers.

@mingxingtan Could you share the model_output_labels.txt file that you used for evaluating EfficinetNet Lite and EdgeTPU?

I was working as a contractor for MSFT when I was running those models. At the time we were just trying to benchmark the model to check accuracy. I unfortunately don’t have any of the files from before. The notes I have above is all I have. And I never got the quantized models to reach published values. We did not every dog any further into the issue. I’m sorry I could not be more helpful to you.