google / gemmlowp

Low-precision matrix multiplication
Apache License 2.0
1.78k stars 451 forks source link

Adding output_range_offset support for requantize transform. #209

Closed everton1984 closed 2 years ago

everton1984 commented 2 years ago

Adding output_range_offset calculation to requantize transform. I noticed this after looking into TF code's comment

  // After adding the output_range_offset the value is cast from float to uint.
  // The float to int/uint cast in NEON uses round toward 0. To keep the
  // rounding consistent with Eigen, which uses round toward closest, we can
  // add 0.5f and exploit the fact that we only operate on non negative values.
  // TODO(maciekc): fix the actual kernel in gemmlowp/meta
  params.kernel.output_range_offset =
      static_cast<float>(std::numeric_limits<uint8_t>::lowest()) + 0.5f;

and realizing 0.5f was not being added. I also switched from fcvtzs to fcvtns as this instruction is round to nearest. Once https://github.com/tensorflow/tensorflow/pull/53493 is merged both TF tests requantize_op_test and quantize_down_and_shrink_range_op_test will pass for aarch64.

everton1984 commented 2 years ago

I should already be on the CLA list if it's possible, please rerun the check.

bjacob commented 2 years ago

To be clear, gemmlowp/meta has been unmaintained for years. I take PRs against it out of an assumption that anyone who cares enough to write a PR, probably knows better than me.