tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.61k stars 74.35k forks source link

`tf.round` on GPU rounds `-0.5` to `0.0` rather than `-0.0` #58517

Open creakseek opened 2 years ago

creakseek commented 2 years ago
Click to expand! ### Issue Type Bug ### Source binary ### Tensorflow Version 2.10.0, and nightly ### Custom Code Yes ### OS Platform and Distribution Linux Ubuntu 20.04 ### Mobile device _No response_ ### Python version 3.7.15 ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version 11.3, 8.2.4 ### GPU model and memory _No response_ ### Current Behaviour? `np.round` and `tf.round` on CPU round `-0.5` to `-0.0`, but `tf.round` on GPU rounds to `0.0`. ### Standalone code to reproduce the issue ```shell print("numpy", np.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) with tf.device("/cpu:0"): print("CPU", tf.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) with tf.device("/gpu:0"): print("GPU", tf.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) ``` ### Relevant log output ```shell numpy [-2. -2. -0. 0. 2. 2.] CPU tf.Tensor([-2. -2. -0. 0. 2. 2.], shape=(6,), dtype=float32) GPU tf.Tensor([-2. -2. 0. 0. 2. 2.], shape=(6,), dtype=float32) ```
bhack commented 2 years ago

@reedwm I suppose this is MLIR generated on GPU. It is really hard to track these OPS in details in the full path when they are MLIR generated. Do you have any hint to speedup the triage?

Check my colab: https://colab.research.google.com/gist/bhack/ad0d1f3be6d3b930b118fa878c9332eb/untitled195.ipynb

I guess this cause I suppose that MLIR is using LLVM round-to-nearest-even mode but it is hard to full track the lowering.

Also as @creakseek It was going to open many of these GPU/CPU misalignment tickets is there a better way to systematically work on these?

reedwm commented 2 years ago

@reedwm I suppose this is MLIR generated on GPU. It really hard to track these OPS in details in the full path when they are MLIR generated. Do you have any hint to speedup the triage?

Unfortunately I am not very familiar with how the MLIR-generated kernels work. @frgossen, can you comment here?

Check my colab: https://colab.research.google.com/gist/bhack/ad0d1f3be6d3b930b118fa878c9332eb/untitled195.ipynb

I guess this cause I suppose that MLIR is using LLVM round-to-nearest-even mode but it is hard to full track the lowering.

Interesting, so it seems TF matches the __float2int_rn behavior. __float2int_rn incorrectly gives positive zero when given -0.5. @nluehr is this a CUDA bug or expected behavior?

Also as @creakseek It was going to open many of these GPU/CPU misalignment tickets is there a better way to systematically work on these?

I think opening separate issues is fine.

bhack commented 2 years ago

is this a CUDA bug or expected behavior?

The hard part with MLIR is to full track the lowering e2e But if it is really lowered to __float2int_rn I think that -0 is expected. See the c++ version

#include <iostream>
#include <cmath>

using namespace std;

int main()
{
    cout << nearbyint( -0.5f ) * 2.0f << endl;
}

I think opening separate issues is fine.

Ok but we really need to find an easier way to track these type of code paths to make an efficient triage.

reedwm commented 2 years ago

But if it is really lowered to __float2int_rn I think that 0 is expected. See the c++ version

You passed a positive number to nearbyint. If you pass a negative number, it prints -0 instead of 0.

Ok but we really need to find an easier way to track these type of code paths to make an efficient triage.

@poulsbo do you think opening a tracking GitHub issue would be appropriate here, to track differences between CPU and GPU?

bhack commented 2 years ago

You passed a positive number to nearbyint. If you pass a negative number, it prints -0 instead of 0.

Yes thanks I fixed the typo. I suppose that c++ gist could represent __float2int_rn correctly.

bhack commented 2 years ago

In the LLVM NVPTX target lowering source code I've found:

// This is the the rounding method used in CUDA libdevice in C like code:
// float roundf(float A)
// {
//   float RoundedA = (float) (int) ( A > 0 ? (A + 0.5f) : (A - 0.5f));
//   RoundedA = abs(A) > 0x1.0p23 ? A : RoundedA;
//   return abs(A) < 0.5 ? (float)(int)A : RoundedA;
// }

But it gives me -1.

#include <iostream>

using namespace std;
// This is the the rounding method used in CUDA libdevice in C like code:
float roundf(float A)
{
   float RoundedA = (float) (int) ( A > 0 ? (A + 0.5f) : (A - 0.5f));
   RoundedA = abs(A) > 0x1.0p23 ? A : RoundedA;
   return abs(A) < 0.5 ? (float)(int)A : RoundedA;
}

int main()
{
  cout << roundf(-0.5f) << endl;
}
bhack commented 2 years ago

OK this match CUDA roundf. I still not find exactly, in the codepath, on what we are lowering.

tilakrayal commented 1 week ago

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

github-actions[bot] commented 18 hours ago

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.