Open creakseek opened 2 years ago
@reedwm I suppose this is MLIR generated on GPU. It is really hard to track these OPS in details in the full path when they are MLIR generated. Do you have any hint to speedup the triage?
Check my colab: https://colab.research.google.com/gist/bhack/ad0d1f3be6d3b930b118fa878c9332eb/untitled195.ipynb
I guess this cause I suppose that MLIR is using LLVM round-to-nearest-even mode but it is hard to full track the lowering.
Also as @creakseek It was going to open many of these GPU/CPU misalignment tickets is there a better way to systematically work on these?
@reedwm I suppose this is MLIR generated on GPU. It really hard to track these OPS in details in the full path when they are MLIR generated. Do you have any hint to speedup the triage?
Unfortunately I am not very familiar with how the MLIR-generated kernels work. @frgossen, can you comment here?
Check my colab: https://colab.research.google.com/gist/bhack/ad0d1f3be6d3b930b118fa878c9332eb/untitled195.ipynb
I guess this cause I suppose that MLIR is using LLVM round-to-nearest-even mode but it is hard to full track the lowering.
Interesting, so it seems TF matches the __float2int_rn
behavior. __float2int_rn
incorrectly gives positive zero when given -0.5. @nluehr is this a CUDA bug or expected behavior?
Also as @creakseek It was going to open many of these GPU/CPU misalignment tickets is there a better way to systematically work on these?
I think opening separate issues is fine.
is this a CUDA bug or expected behavior?
The hard part with MLIR is to full track the lowering e2e
But if it is really lowered to __float2int_rn
I think that -0
is expected. See the c++ version
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
cout << nearbyint( -0.5f ) * 2.0f << endl;
}
I think opening separate issues is fine.
Ok but we really need to find an easier way to track these type of code paths to make an efficient triage.
But if it is really lowered to __float2int_rn I think that 0 is expected. See the c++ version
You passed a positive number to nearbyint
. If you pass a negative number, it prints -0 instead of 0.
Ok but we really need to find an easier way to track these type of code paths to make an efficient triage.
@poulsbo do you think opening a tracking GitHub issue would be appropriate here, to track differences between CPU and GPU?
You passed a positive number to nearbyint. If you pass a negative number, it prints -0 instead of 0.
Yes thanks I fixed the typo. I suppose that c++ gist could represent __float2int_rn
correctly.
In the LLVM NVPTX target lowering source code I've found:
// This is the the rounding method used in CUDA libdevice in C like code:
// float roundf(float A)
// {
// float RoundedA = (float) (int) ( A > 0 ? (A + 0.5f) : (A - 0.5f));
// RoundedA = abs(A) > 0x1.0p23 ? A : RoundedA;
// return abs(A) < 0.5 ? (float)(int)A : RoundedA;
// }
But it gives me -1
.
#include <iostream>
using namespace std;
// This is the the rounding method used in CUDA libdevice in C like code:
float roundf(float A)
{
float RoundedA = (float) (int) ( A > 0 ? (A + 0.5f) : (A - 0.5f));
RoundedA = abs(A) > 0x1.0p23 ? A : RoundedA;
return abs(A) < 0.5 ? (float)(int)A : RoundedA;
}
int main()
{
cout << roundf(-0.5f) << endl;
}
OK this match CUDA roundf
. I still not find exactly, in the codepath, on what we are lowering.
Hi,
Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.
The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.
Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
Click to expand!
### Issue Type Bug ### Source binary ### Tensorflow Version 2.10.0, and nightly ### Custom Code Yes ### OS Platform and Distribution Linux Ubuntu 20.04 ### Mobile device _No response_ ### Python version 3.7.15 ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version 11.3, 8.2.4 ### GPU model and memory _No response_ ### Current Behaviour? `np.round` and `tf.round` on CPU round `-0.5` to `-0.0`, but `tf.round` on GPU rounds to `0.0`. ### Standalone code to reproduce the issue ```shell print("numpy", np.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) with tf.device("/cpu:0"): print("CPU", tf.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) with tf.device("/gpu:0"): print("GPU", tf.round([-2.5, -1.5, -0.5, 0.5, 1.5, 2.5])) ``` ### Relevant log output ```shell numpy [-2. -2. -0. 0. 2. 2.] CPU tf.Tensor([-2. -2. -0. 0. 2. 2.], shape=(6,), dtype=float32) GPU tf.Tensor([-2. -2. 0. 0. 2. 2.], shape=(6,), dtype=float32) ```