tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
184k stars 74.06k forks source link

log operator outputs wrong results with XLA compilation #57744

Open Co1lin opened 1 year ago

Co1lin commented 1 year ago
Click to expand! ### Issue Type Bug ### Source binary ### Tensorflow Version 2.11 ### Custom Code No ### OS Platform and Distribution _No response_ ### Mobile device _No response_ ### Python version _No response_ ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version _No response_ ### GPU model and memory _No response_ ### Current Behaviour? With XLA compilation, the operator log will output a normal real number for nan as an input. This may hide numeric issues in the network, or break numeric checking like `is_nan`, making it hard for developers to debug or handle corner cases. The XLA should have the same behavior as the eager mode. ### Standalone code to reproduce the issue I can reproduce this issue on version `2.11.0-dev20220914`. I can also reproduce it in CoLab. https://colab.research.google.com/drive/1KH1slOkHrd-sFgPpHTqvFC54noiaRJUO?usp=sharing ```python import tensorflow as tf print(tf.__version__) from keras import layers class MyModule(tf.Module): def __init__(self): super().__init__() @tf.function(jit_compile=True) def __call__(self, x): x = tf.pow(x, x) x = tf.math.log(x) # NOTE: tf.experimental.numpy.log2 will also output wrong result with XLA return x def simple_diff(): m = MyModule() x = tf.constant( -1.5, shape=[1], dtype=tf.float32, ) with tf.device('/CPU:0'): tf.config.run_functions_eagerly(True) out = m(x) print(out) # RIGHT! tf.Tensor([nan], shape=(1,), dtype=float32) tf.config.run_functions_eagerly(False) with tf.device('/CPU:0'): out = m(x) print(out) # NOTE: WRONG! tf.Tensor([-0.8774437], shape=(1,), dtype=float32) simple_diff() ``` ### Relevant log output ```python 2.8.2 tf.Tensor([nan], shape=(1,), dtype=float32) # <-- right tf.Tensor([-0.6081976], shape=(1,), dtype=float32) # <-- wrong ```
tiruk007 commented 1 year ago

@gadagashwini I was able to reproduce the issue on Colab using TFV2.10. Please find the gist here for reference

Thank you!

cheshire commented 1 year ago

Created b/249447060 to track internally.

d0k commented 1 year ago

Algebraic Simplifier rewrites ln(pow(A,B)) => B*ln(abs(A)). That doesn't look right for negative B.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/algebraic_simplifier.cc#L3373

ganler commented 1 year ago

Algebraic Simplifier rewrites ln(pow(A,B)) => B*ln(abs(A)). That doesn't look right for negative B.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/algebraic_simplifier.cc#L3373

Interesting. Also, I think it could be wrong for negative A with non-intergeral B. For example:

import numpy as np
a = -2; b = 0.5

print(f"{np.log2(np.power(a, b)) = }")
print(f"{b * np.log2(abs(a)) = }")
"""
np.log2(np.power(a, b)) = nan
b * np.log2(abs(a)) = 0.5
"""

The former experssion's domain is:

The latter's is:

ganler commented 1 year ago

For this specific case, the result could be consistent if we remove abs, i.e., ln(pow(A,B)) => B*ln(A) (I think it is also one of the ofast rule in compilers like clang).

a = -2; b = 0.5
print(f"{np.log2(np.power(a, b)) = }") # nan
print(f"{b * np.log2(a) = }") # nan

But for cases of a < 0 && is_even(b), it goes wrong.

a = -2; b = 2
print(f"{np.log2(np.power(a, b)) = }") # 2.0
print(f"{b * np.log2(a) = }") # nan

Maybe the strictly correct one can be: (A < 0 && is_even(B)) ? ln(-A) : ln(A) and put branch preference on the false branch. An approximate but still fast is_even for float could be:

bool is_even(float f) { // could be 4 instructions on x64.
    constexpr int precision = 16;
    const int i32f = int(f * precision);
    return i32f % (2 * precision) == 0;
}

But I am not sure how hard it is to integrate such expressions to hilo...

Also see compiler explorer: https://godbolt.org/z/Ec5ah3PKP