iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 572 forks source link

Numeric error for quantized mobilebert after LLVM integration #9796

Open antiagainst opened 2 years ago

antiagainst commented 2 years ago

The following tests fail after integration #9790:

Failed Tests (1):
  TENSORFLOW_TESTS :: iree_tfl_tests/llvmaot_mobilebert_tf2_quant.run

More relavent logs:

I0714 19:19:33.757150 140043112339264 test_util.py:141] Setting up for IREE
I0714 19:19:33.758712 140043112339264 binaries.py:216] Invoke IREE Pipeline:
  /tmpfs/src/github/iree/integrations/tensorflow/python_projects/iree_tflite/iree/tools/tflite/iree-import-tflite /tmp/lit/iree_tfl_tests/Output/llvmaot_mobilebert_tf2_quant.run.tmp/download/model.tflite --mlir-print-debuginfo --save-temp-tfl-input=/tmp/lit/iree_tfl_tests/Output/llvmaot_mobilebert_tf2_quant.run.tmp/download/tflite.mlir --save-temp-iree-input=/tmp/lit/iree_tfl_tests/Output/llvmaot_mobilebert_tf2_quant.run.tmp/download/tosa.mlir
  /home/kbuilder/iree/build/tf/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=tosa --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=dylib-llvm-aot -o=/tmp/lit/iree_tfl_tests/Output/llvmaot_mobilebert_tf2_quant.run.tmp/download/module.bytecode --iree-llvm-embedded-linker-path=/home/kbuilder/iree/build/tf/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false
I0714 19:29:43.149756 140043112339264 test_util.py:97] Setting up tflite interpreter
I0714 19:29:43.161504 140043112339264 test_util.py:104] Setting up iree runtime
I0714 19:29:43.263613 140043112339264 test_util.py:154] Setting up test inputs
I0714 19:29:43.264165 140043112339264 mobilebert_tf2_quant_test.py:21]  [  1 384], int32
I0714 19:29:43.264312 140043112339264 mobilebert_tf2_quant_test.py:21]  [  1 384], int32
I0714 19:29:43.264446 140043112339264 mobilebert_tf2_quant_test.py:21]  [  1 384], int32
I0714 19:29:43.264645 140043112339264 test_util.py:157] Invoking TFLite
I0714 19:30:10.864070 140043112339264 test_util.py:118] Invocation time: 27.5992 seconds
I0714 19:30:10.864330 140043112339264 test_util.py:160] Invoke IREE
I0714 19:30:11.477345 140043112339264 test_util.py:133] Invocation time: 0.6127 seconds
I0714 19:30:11.477808 140043112339264 test_util.py:94] Max error (0): 8.559660
I0714 19:30:11.477946 140043112339264 test_util.py:94] Max error (1): 8.070536
[  FAILED  ] MobileBertTest.test_compile_tflite

The test is disabled to push forward integration.

hanhanW commented 2 years ago

I found that the max error was high like 6.36. According to the comment, it's expected to have high max error because of quantized Softmax issues.

https://github.com/iree-org/iree/blob/dbeec8eadabedb442659503bc0a76d87ce7c5069/integrations/tensorflow/test/python/iree_tfl_tests/mobilebert_tf2_quant_test.py#L38-L43

The error is bumped to 8.559662 after the integration. @rsuderman are there changes on frontend side might change the input IR which leads to the numeric issue recently? Also, is the error acceptable?

cc @bjacob @mariecwhite for visibility, since you've participated in the discussion of error values in https://github.com/iree-org/iree/pull/9337.

hanhanW commented 2 years ago

The artifacts can be downloaded from https://storage.googleapis.com/iree-shared-files/nod-perf/hanchung/issue_9796.zip

To repro:

$ iree-compile --iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot -iree-input-type=tosa ~/mobilebert_quant_tosa.mlir -o /tmp/a.vmfb
$ iree-run-module --module_file=/tmp/a.vmfb --device=local-sync --entry_function=main --function_input=@/tmp/mobilebert_quant/download/input0.npy --function_input=@/tmp/mobilebert_quant/download/input1.npy --function_input=@/tmp/mobilebert_quant/download/input2.npy
bjacob commented 2 years ago

Things that would help decide as compromises between certainty and engineering effort - ordered from quickest to most certainty:

  1. Dump output activations (to put that 8.559662 in context).
  2. Bisect this down to one LLVM commit.
  3. Re-run end-to-end accuracy tests (the ones that Marie originally ran to confirm that the original 5.0 tolerance was viable).
allieculp commented 2 years ago

@bjacob @hanhanW Can we add a priority to this? P1?

bjacob commented 2 years ago

Good idea, set P1 : we should either know the cause of this regression or make a conscious decision not to invest in that :-) Given @hanhanW 's clean reproduction steps above, it should be feasible to at least get the bisection.

hanhanW commented 2 years ago

The error goes down after some integrates... new error range:


I0720 01:12:07.200919 140737350333696 test_util.py:94] Max error (0): 6.603168
I0720 01:12:07.201174 140737350333696 test_util.py:94] Max error (1): 8.559662
jpienaar commented 2 years ago

Benoit has good point, is this something we are concerned about and digging in?

allieculp commented 1 year ago

@antiagainst @hanhanW Is this still active?

hanhanW commented 1 year ago

I think it's still active, but maybe we can set it to P2.