google / automl

Google Brain AutoML
Apache License 2.0
6.22k stars 1.45k forks source link

Unable run evaluation on TFLite - Shapes of all inputs must match #1171

Closed oliviawindsir closed 2 years ago

oliviawindsir commented 2 years ago

I wanted to try quantize efficientdet-lite2 using this autoML repo. In one of the issue, I saw people recommending to run via the notebook in efficientdet/tf2/tutorial.ipynb. I could not run it right there and then in that level. So what I did is to make a copy of the notebook out of the folder at efficientdet/ level. Most of the steps run perfectly fine until the step to evaluate the quantized tflite model in section 2.2.

The snippet of cells that I was trying to run was shown below.

# Evalute on validation set (takes about 10 mins for efficientdet-d0)
!python -m tf2.eval_tflite  \
    --model_name={MODEL}  --tflite_path={saved_model_dir}/int8.tflite \
    --val_file_pattern=tfrecord/val* \
    --val_json_file=annotations/instances_val2017.json --eval_samples=10

When running the evaluation part in the notebook above, I got the following error:

# Evalute on validation set (takes about 10 mins for efficientdet-d0)

!python -m tf2.eval_tflite  \

    --model_name={MODEL}  --tflite_path={saved_model_dir}/int8.tflite \

    --val_file_pattern=tfrecord/val* \

    --val_json_file=annotations/instances_val2017.json --eval_samples=10

2022-09-07 13:11:04.784592: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-09-07 13:11:06.602153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-07 13:11:06.611957: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-09-07 13:11:06.611984: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-09-07 13:11:06.615605: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/local/github_repo/automl/efficientdet/tf2/eval_tflite.py", line 203, in <module>
    app.run(main)
  File "/home/local/github_repo/automl/venv/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/local/github_repo/automl/venv/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/local/github_repo/automl/efficientdet/tf2/eval_tflite.py", line 170, in main
    detections = postprocess.generate_detections_from_nms_output(
  File "/home/local/github_repo/automl/efficientdet/tf2/postprocess.py", line 527, in generate_detections_from_nms_output
    return tf.stack(detections_bs, axis=-1, name='detections')
  File "/home/local/github_repo/automl/venv/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/local/github_repo/automl/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7164, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [1,25] != values[1].shape = [1,1] [Op:Pack] name: detections
oliviawindsir commented 2 years ago

Upon investigating further, I found out that there is a mismatched of return values from TFLite Interpreter. In this lite_runner, it reads 3 out of 4 parameters and did a post processing based on the output.

nms_boxes_bs, nms_classes_bs, nms_scores_bs, _ = lite_runner.run(images)

However, looking at the class LiteRunner and its run() function, it is actually returning the following output in sequence:

# TFLite model with post-processing.
      # Four Outputs:
      #   num_boxes: a float32 tensor of size 1 containing the number of
      #     detected boxes
      #   detection_scores: a float32 tensor of shape [1, num_boxes]
      #     with class scores
      #   detection_classes: a float32 tensor of shape [1, num_boxes]
      #     with class indices
      #   detection_boxes: a float32 tensor of shape [1, num_boxes, 4] with box
      #     locations

I did a print on the returned values just to confirm the above return structure and it proved to be true.

[25.]

[[1.90625   1.7070312 1.6367188 1.5195312 1.5       1.421875  1.421875
  1.34375   1.328125  1.328125  1.2617188 1.2617188 1.2460938 1.2304688
  1.2304688 1.2148438 1.2148438 1.203125  1.203125  1.203125  1.1914062
  1.1914062 1.1914062 1.1796875 1.1796875]]

[[ 0. 50. 50.  0. 78. 50. 66. 78. 46. 63. 50. 46. 49. 79. 46. 49. 49. 49.
  50. 49. 49. 49. 50. 49. 49.]]

[[[ 1.0870631e-01  6.0856193e-01  5.3415084e-01  7.8300840e-01]
  [ 5.3346699e-01  5.0022423e-02  6.0073441e-01  1.5560755e-01]
  [ 4.5116577e-01  9.1615178e-02  5.1579547e-01  2.0761868e-01]
  [ 4.0700871e-01 -7.6876581e-04  4.8332697e-01  9.9255890e-02]
  [ 2.7915275e-01 -3.0338019e-03  4.5737642e-01  3.0552655e-01]
  [ 4.7963738e-01 -1.8516928e-04  5.6779981e-01  1.3329524e-01]
  [ 3.6036891e-01 -4.5481920e-03  6.5900868e-01  5.2189851e-01]
  [ 2.7603590e-01  3.6531299e-01  4.9422097e-01  6.1735934e-01]
  [ 4.1552711e-01  2.1859461e-01  4.7106594e-01  2.6934478e-01]
  [ 6.7321286e-03 -5.0023943e-04  2.3459271e-01  9.4630346e-02]
  [ 4.0802893e-01  3.9650986e-01  4.4082054e-01  4.5643839e-01]
  [ 4.2508441e-01  1.9213863e-01  4.7403681e-01  2.3016869e-01]
  [ 2.7977206e-02  4.4822890e-01  1.2727207e-01  4.7068286e-01]
  [ 2.5478679e-01  5.0784653e-01  3.2476139e-01  5.7413596e-01]
  [ 4.0729308e-01  3.4307864e-01  4.6384257e-01  4.0065721e-01]
  [ 8.4463120e-02  5.3181601e-01  1.7519653e-01  5.6020224e-01]
  [ 5.0127631e-01  1.9216484e-01  5.4273134e-01  2.6792631e-01]
  [ 7.4091956e-02  7.5158542e-01  1.4985339e-01  7.7173668e-01]
  [ 9.6471682e-02  9.7586125e-02  2.0205681e-01  1.9064963e-01]
  [ 6.7943789e-02  7.4670768e-01  2.1726370e-01  7.9601979e-01]
  [ 6.2376749e-02  4.6040615e-01  1.3951683e-01  4.8055735e-01]
  [ 7.5439975e-02  8.2979095e-01  2.1995910e-01  8.6491621e-01]
  [ 5.0147271e-01  1.2149759e-02  5.9189928e-01  1.5265068e-01]
  [ 6.7584962e-02  4.8720428e-01  1.1498937e-01  5.0176984e-01]
  [ 6.9879174e-02  8.0105811e-01  1.6226369e-01  8.2120937e-01]]]
fsx950223 commented 2 years ago

Fixed