Closed dyung closed 1 year ago
cc @LebedevRI
Thank you, i will take a look. I have a suspicion it's going to end up pointing at X86 shuffle combines though (CC @RKSimon)
@dyung running opt pipeline on that example even with clang-15 produces again different results: https://godbolt.org/z/Kd1WbfW5d Are you //sure// there is no usual FP brokenness going on in that example?
So in that new additional SROA run, we promote %id18878.i = alloca <4 x float>, align 16
.
Said promotion looks obviously correct (C) to me. I think this is the important bit:
rewriting [0,16) slice #0
original: %id18878.i.0.id18878.i.0.id18878.0..i = load <4 x float>, ptr %id18878.i, align 16, !tbaa !5
to: %8 = bitcast <16 x i8> %id18878.i.sroa.0.0.load to <4 x float>
aka
%8 = bitcast <16 x i8> <i8 -4, i8 -3, i8 -2, i8 -1, i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11> to <4 x float>
aka
IC: ConstFold to: <4 x float> <float 0xFFFFDFBF80000000, float 0x3860402000000000, float 0x38E0C0A080000000, float 0x3961412100000000> from: %8 = bitcast <16 x i8> <i8 -4, i8 -3, i8 -2, i8 -1, i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11> to <4 x float>
Relevant debug output:
@dyung running opt pipeline on that example even with clang-15 produces again different results: https://godbolt.org/z/Kd1WbfW5d Are you //sure// there is no usual FP brokenness going on in that example?
I’m not really familiar enough with FP stuff to say unfortunately. Although I did note that before your change, the O0 and O2 compilations did produce the same result, while after only the O0 compilation produces the expected result.
Usually that's a tell-tale of UB in source code.
Another observation:
$ clang++-15 -march=btver2 -O3 /tmp/test.cpp -o old.ll -S -emit-llvm
/tmp/test.cpp:29:8: warning: implicit conversion from 'int' to 'char' changes value from 211 to -45 [-Wconstant-conversion]
init(211, &test89_id18854, sizeof(test89_id18854));
~~~~ ^~~
/tmp/test.cpp:31:8: warning: implicit conversion from 'int' to 'char' changes value from 205 to -51 [-Wconstant-conversion]
init(205, &test89_id18860, sizeof(test89_id18860));
~~~~ ^~~
/tmp/test.cpp:35:10: warning: implicit conversion from 'int' to 'char' changes value from 220 to -36 [-Wconstant-conversion]
init(220, &test89_id18872, sizeof(test89_id18872));
~~~~ ^~~
/tmp/test.cpp:38:8: warning: implicit conversion from 'int' to 'char' changes value from 220 to -36 [-Wconstant-conversion]
init(220, &test89_id18873, sizeof(test89_id18873));
~~~~ ^~~
/tmp/test.cpp:40:8: warning: implicit conversion from 'int' to 'char' changes value from 252 to -4 [-Wconstant-conversion]
init(252, &id18878, sizeof(id18878));
~~~~ ^~~
5 warnings generated.
$ lli-16 old.ll
-268361104.000000
$ ./bin/opt -sroa old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-268361104.000000
$ ./bin/opt -sroa -instcombine old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-4939670898945374807849435136.000000
$ ./bin/opt -instcombine old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-268361104.000000
So the SROA itself does not appear to cause the problem, but InstCombine manages to capitalize on the promotion, and expose it.
It's come to my attention that the reduction may have introduced some uses of undefined variables. Let me try to get the original test with the dependencies as reduced as I can.
It's come to my attention that the reduction may have introduced some uses of undefined variables. Let me try to get the original test with the dependencies as reduced as I can.
Thanks! Just so we eliminate the obvious, given the most original test case you have, what happens if you do:
$ good-clang++ -march=btver2 -O3 unreduced-test.cpp -S -emit-llvm -o - | good-opt -sroa -instcombine - -o -| good-lli -
?
If it does not print -268361104.000000
, then the problem is elsewhere.
It's come to my attention that the reduction may have introduced some uses of undefined variables. Let me try to get the original test with the dependencies as reduced as I can.
Thanks! Just so we eliminate the obvious, given the most original test case you have, what happens if you do:
$ good-clang++ -march=btver2 -O3 unreduced-test.cpp -S -emit-llvm -o - | good-opt -sroa -instcombine - -o -| good-lli -
? If it does not print
-268361104.000000
, then the problem is elsewhere.
The original-ish test case that I am working on reducing does indeed print that:
$ ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/clang++ -march=btver2 -O3 test.cpp -S -emit-llvm -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/opt -sroa -instcombine - -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/lli -
test.cpp:68:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%s:%s\n", msg, tmp);
^
printf("%f\n", id18835[0]);
^
2 warnings generated.
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-268361104.000000
Try this example which more or less leaves the original test function intact (I only removed the printing of some of the other bytes of id18835 which did not seem to be affected):
extern "C" {
int sprintf(char *str, const char *format, ...);
int printf(char *, ...);
int isnan(float);
int isinf(float);
};
#include <x86intrin.h>
__attribute__((optnone))
float norm_nan(float x) {
if (isnan(x) || isinf(x))
return 0.0f;
return x;
}
template <typename T>
static T zero_upper(T in, unsigned bits)
{
constexpr unsigned elems = sizeof(T) / sizeof(char);
union { T x; char c[elems]; };
x = in;
unsigned elems_to_zero = bits / 8;
for (unsigned i = elems_to_zero; i != elems; ++i)
c[i] = 0;
return x;
}
static void init(unsigned char pred, volatile void *data, unsigned size) {
unsigned char *bytes = (unsigned char *)data;
for (unsigned i = 0; i != size; ++i) {
bytes[i] = pred + i;
}
}
#define INIT(PRED, VAR) init(PRED, &VAR, sizeof(VAR))
__attribute__((noinline))
static void print(const char *msg, volatile void *data, unsigned size) {
unsigned char *bytes = (unsigned char *)data;
char tmp[256];
for (unsigned i = 0; i != size; ++i) {
sprintf(tmp + i * 2, "%02x", bytes[size - 1 - i]);
}
printf("%s:%s\n", msg, tmp);
}
#define PRINT(VAR) print(#VAR, &VAR, sizeof(VAR))
typedef unsigned char uchar;
typedef long long ll;
typedef ll __attribute__((ext_vector_type(2))) ll2;
typedef float __attribute__((ext_vector_type(8))) float8;
#define cast_ll2_to___m128i(__A) ((__m128i)(__A))
#define SAFE__mm256_castps128_ps256(__A) (zero_upper(_mm256_castps128_ps256(__A), 128))
#define static_cast___m256_to_float8(__A) (static_cast<float8>(__A))
void test89()
{
ll2 id18839 = (ll2){(ll)-1964383749LL, (ll)994513392LL}; // vec_type
__m128i id18838 = cast_ll2_to___m128i(id18839);
__m256 id18837 = _mm256_cvtph_ps(id18838);
__m128 id18845;
INIT(69, id18845);
__m256 id18844 = SAFE__mm256_castps128_ps256(id18845);
__m256 id18843 = _mm256_rcp_ps(id18844);
uchar id18851;
INIT(113, id18851);
volatile __m256 id18848;
INIT(189, id18848);
for (uchar id18849_idx = 0; (id18849_idx < id18851); ++id18849_idx)
{
__m256 id18850;
INIT(223, id18850);
id18848 += id18850;
}
long long id18853;
INIT(227, id18853);
__m256i id18852 = _mm256_set1_epi64x(id18853);
__m256 id18847 = _mm256_permutevar_ps(id18848, id18852);
__m256 id18846 = _mm256_movehdup_ps(id18847);
__m256 id18842 = _mm256_max_ps(id18843, id18846);
volatile __m256 id18854;
INIT(211, id18854);
volatile __m256 id18841 = _mm256_and_ps(id18842, id18854);
__m256 id18858;
INIT(120, id18858);
for (uchar id18859_idx = 0; (id18859_idx < 4); ++id18859_idx)
{
static volatile __m256 id18860;
INIT(205, id18860);
id18858 = id18860;
}
volatile __m256 id18861;
INIT(18, id18861);
for (uchar id18862_idx = 0; (id18862_idx < 137); ++id18862_idx)
{
uchar id18867 = id18862_idx;
__m256 id18864;
INIT(214, id18864);
for (uchar id18865_idx = 0; (id18865_idx < id18867); ++id18865_idx)
{
volatile __m256 id18866;
INIT(239, id18866);
id18864 -= id18866;
}
__m256 id18868;
INIT(78, id18868);
__m256 id18863 = _mm256_addsub_ps(id18864, id18868);
id18861 += id18863;
}
__m256 id18857 = _mm256_hsub_ps(id18858, id18861);
__m256 id18870;
INIT(83, id18870);
for (uchar id18871_idx = 0; (id18871_idx < 92); ++id18871_idx)
{
__m256 id18872;
INIT(220, id18872);
id18870 *= id18872;
}
volatile __m256 id18873;
INIT(220, id18873);
__m128 id18875;
INIT(11, id18875);
for (uchar id18876_idx = 0; (id18876_idx < 214); ++id18876_idx)
{
__m128 id18878;
INIT(252, id18878);
__m128 id18879;
INIT(59, id18879);
for (uchar id18880_idx = 0; (id18880_idx < 131); ++id18880_idx)
{
__m128 id18881;
INIT(78, id18881);
id18879 -= id18881;
}
__m128 id18877 = _mm_add_ss(id18878, id18879);
id18875 *= id18877;
}
__m256 id18874 = SAFE__mm256_castps128_ps256(id18875);
__m256 id18869 = _mm256_blendv_ps(id18870, id18873, id18874);
__m256 id18856 = _mm256_unpacklo_ps(id18857, id18869);
__m256 id18882;
INIT(122, id18882);
for (uchar id18883_idx = 0; (id18883_idx < 252); ++id18883_idx)
{
__m256 id18884;
INIT(21, id18884);
id18882 += id18884;
}
__m256 id18855 = _mm256_hadd_ps(id18856, id18882);
__m256 id18840 = _mm256_andnot_ps(id18841, id18855);
__m256 id18836 = _mm256_or_ps(id18837, id18840);
float8 id18835 = static_cast___m256_to_float8(id18836);
id18835[0] = norm_nan(id18835[0]);
printf("%f\n", id18835[0]);
}
int main() {
test89();
}
This was the source that I used with the command you asked me to try. There shouldn't be any uses of undefined variables here hopefully!
FWIW, that didn't change the outcome, running -sroa -instcombine
on the good IR still resurfaces the problem:
$ clang++-15 -march=btver2 -O2 /tmp/test.cpp -o old.ll -S -emit-llvm
/tmp/test.cpp:45:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%s:%s\n", msg, tmp);
^
/tmp/test.cpp:158:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%f\n", id18835[0]);
^
2 warnings generated.
$ ./bin/opt -sroa -instcombine old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-3701730859659994532950310912.000000
$ ./bin/opt -sroa old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-268361104.000000
$ ./bin/opt -instcombine old.ll -o - | lli-16 -
The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version).
See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax.
-268361104.000000
The original-ish test case that I am working on reducing does indeed print that:
$ ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/clang++ -march=btver2 -O3 test.cpp -S -emit-llvm -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/opt -sroa -instcombine - -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/lli - test.cpp:68:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] printf("%s:%s\n", msg, tmp); ^ printf("%f\n", id18835[0]); ^ 2 warnings generated. The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version). See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax. -268361104.000000
Actually, no, it doesn't:
$ clang++-15 -march=btver2 -O2 /tmp/test.cpp -o - -emit-llvm -S | ./bin/opt -O2 -o - - | ./bin/lli
/tmp/test.cpp:45:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%s:%s\n", msg, tmp);
^
/tmp/test.cpp:158:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%f\n", id18835[0]);
^
2 warnings generated.
-3701730859659994532950310912.000000
I strongly suspect UB in your original source code.
The original-ish test case that I am working on reducing does indeed print that:
$ ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/clang++ -march=btver2 -O3 test.cpp -S -emit-llvm -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/opt -sroa -instcombine - -o - | ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/lli - test.cpp:68:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] printf("%s:%s\n", msg, tmp); ^ printf("%f\n", id18835[0]); ^ 2 warnings generated. The `opt -passname` syntax for the new pass manager is deprecated, please use `opt -passes=<pipeline>` (or the `-p` alias for a more concise version). See https://llvm.org/docs/NewPassManager.html#invoking-opt for more details on the pass pipeline syntax. -268361104.000000
Actually, no, it doesn't:
$ clang++-15 -march=btver2 -O2 /tmp/test.cpp -o - -emit-llvm -S | ./bin/opt -O2 -o - - | ./bin/lli /tmp/test.cpp:45:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] printf("%s:%s\n", msg, tmp); ^ /tmp/test.cpp:158:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] printf("%f\n", id18835[0]); ^ 2 warnings generated. -3701730859659994532950310912.000000
Interestingly, there seems to be a difference between O2 and O3:
$ ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/clang++ -march=btver2 -O2 test.cpp -o - -emit-llvm -S | ~/src/upstream/8adfa29706e5407b62a4726e2172894e0dfdc1e8-linux/bin/opt -O2 -o - - | ~/src/upstream/8adfa29706e5407b62a4726e2172894e0dfdc1e8-linux/bin/lli -
test.cpp:45:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%s:%s\n", msg, tmp);
^
test.cpp:158:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%f\n", id18835[0]);
^
2 warnings generated.
-3701730859659994532950310912.000000
$ ~/src/upstream/ffe1661fabc9cf379a10a0bf15268c6549e4836f-linux/bin/clang++ -march=btver2 -O3 test.cpp -o - -emit-llvm -S | ~/src/upstream/8adfa29706e5407b62a4726e2172894e0dfdc1e8-linux/bin/opt -O2 -o - - | ~/src/upstream/8adfa29706e5407b62a4726e2172894e0dfdc1e8-linux/bin/lli -
test.cpp:45:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%s:%s\n", msg, tmp);
^
test.cpp:158:10: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
printf("%f\n", id18835[0]);
^
2 warnings generated.
-268361104.000000
(Which, again, is usually a tell-tale of UB.)
I think you are probably right at this point. I'm not very familiar with fp stuff, any suggestions on how to find the UB stuff? Is there a sanitizer for this I could try? I'll also try to ask around internally to see if anyone has any ideas.
@dyung: Yes, there is undefined behavior sanitizer. Please also enable as much compiler warnings as possible and run static analysis tool like Clang Static Analyzer and Clang-tidy.
Unfortunately, i've already tried UBSan/ASan/MSan on these examples, and they didn't complain. If you do figure it out, please let me know, perhaps UBSan can be taught about it.
Here's a further reduced example. I think this is nan weirdness - possibly caused by the way id18878
is initialized. At O0 I observe:
id18875:0.000000 0.000000 0.000000 0.000000
id18879:0.184800 48.312740 12625.065430 3297809.750000
id18877:-nan 0.000000 0.000000 0.000000
id18875:-nan 0.000000 0.000000 0.000000
and at O2:
id18875:0.000000 0.000000 0.000000 0.000000
id18879:0.184800 48.312740 12625.065430 3297809.750000
id18877:nan 0.000000 0.000000 0.000000
id18875:nan 0.000000 0.000000 0.000000
extern "C" {
int sprintf(const char *str, const char *format, ...);
int printf(const char *, ...);
};
#include <x86intrin.h>
static void init(unsigned char pred, volatile void *data, unsigned size) {
unsigned char *bytes = (unsigned char *)data;
for (unsigned i = 0; i != size; ++i) {
bytes[i] = pred + i;
}
}
#define INIT(PRED, VAR) init(PRED, &VAR, sizeof(VAR))
void test89()
{
__m128 id18875;
INIT(11, id18875);
printf("id18875:%f %f %f %f\n", id18875[0], id18875[1], id18875[2], id18875[3]);
__m128 id18878;
INIT(252, id18878);
__m128 id18879;
INIT(59, id18879);
id18879 = (__m128){0.1848f, 48.31274f, 12625.06543f, 3297809.75f};
printf("id18879:%f %f %f %f\n", id18879[0], id18879[1], id18879[2], id18879[3]);
for (unsigned char id18880_idx = 0; (id18880_idx < 131); ++id18880_idx)
{
__m128 id18881 = (__m128){55917731840.f, 14590895194112.f, 3805913865519104.f, 992398991704457215.f};
id18879 -= id18881;
}
__m128 id18877 = _mm_add_ss(id18878, id18879);
id18875 *= id18877;
printf("id18877:%f %f %f %f\n", id18877[0], id18877[1], id18877[2], id18877[3]);
printf("id18875:%f %f %f %f\n", id18875[0], id18875[1], id18875[2], id18875[3]);
}
int main() {
test89();
}
I've looked into this, and the short summary is that I'm virtually certain this is a latent compiler bug in InstCombine exposed by the new run of SROA (rather than undefined behavior, or a more blatant bug in the test-case). As suggested by @EugeneZelenko, it looks related to shuffle vector (possibly some incorrect poisoning of operands), and as noted by @gregbedwell, it's related to the handling of NaNs (so something of a rare corner case, IMO).
Using a modified version of Greg's reduced test-case (listed below, containing some changes that include a more detailed printing of vector float values, via a function printVect
), I get the following behavior at '-O0'
id18875: 1.739e-30(0x0e0d0c0b) 4.577e-28(0x1211100f) 1.204e-25(0x16151413) 3.166e-23(0x1a191817)
id18879: 1.848e-01(0x3e3d3c36) 4.831e+01(0x4241403f) 1.263e+04(0x46454443) 3.298e+06(0x4a494847)
Input id18877: -nan(0xfffefdfc) 3.820e-37(0x03020100) 1.008e-34(0x07060504) 2.658e-32(0x0b0a0908)
Input id18879: -7.325e+12(0xd4d53138) -1.911e+15(0xd8d94d56) -4.986e+17(0xdcdd696f) -1.300e+20(0xe0e18589)
id18877[0] += id18879[0];
Updated id18877: -nan(0xfffefdfc) 3.820e-37(0x03020100) 1.008e-34(0x07060504) 2.658e-32(0x0b0a0908)
id18875 *= id18877;
Updated id18875: -nan(0xfffefdfc) 0.000e+00(0x00000000) 0.000e+00(0x00000000) 0.000e+00(0x00000000)
and at '-O2':
id18875: 1.739e-30(0x0e0d0c0b) 4.577e-28(0x1211100f) 1.204e-25(0x16151413) 3.166e-23(0x1a191817)
id18879: 1.848e-01(0x3e3d3c36) 4.831e+01(0x4241403f) 1.263e+04(0x46454443) 3.298e+06(0x4a494847)
Input id18877: -nan(0xfffefdfc) 3.820e-37(0x03020100) 1.008e-34(0x07060504) 2.658e-32(0x0b0a0908)
Input id18879: -7.325e+12(0xd4d53138) -1.911e+15(0xd8d94d56) -4.986e+17(0xdcdd696f) -1.300e+20(0xe0e18589)
id18877[0] += id18879[0];
Updated id18877: nan(0x7fc00000) 3.820e-37(0x03020100) 1.008e-34(0x07060504) 2.658e-32(0x0b0a0908)
id18875 *= id18877;
Updated id18875: nan(0x7fc00000) 0.000e+00(0x00000000) 0.000e+00(0x00000000) 0.000e+00(0x00000000)
The two "Updated" lines produce the wrong NaN value. (FTR, the compiler previous to 8adfa29706e5407b62a4726e2172894e0dfdc1e8 gets the '-O0' results shown above at all optimization levels.) The "Updated id18877" value that's printed above in the '-O2' version isn't actually computed at run-time -- it's just directly a literal in the IR (a literal that appears after a transformation related to a shufflevector in InstCombine).
typedef unsigned long size_t;
typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
extern "C" int printf(const char *, ...);
extern "C" void *memcpy(void *dest, const void *src, size_t n);
static __attribute__((noinline)) void rawFloatPrint(float f) {
unsigned int u;
memcpy(&u, &f, sizeof(f));
printf("%.3e(0x%08x)", f, u);
}
static __attribute__((noinline)) void printVect(const char *name, __m128 val) {
if ((name != nullptr) && (*name != '\0'))
printf("%15s: ", name);
for (int i = 0; i < 4; ++i) {
if (i) printf(" ");
rawFloatPrint(val[i]);
}
printf("\n");
}
static void init(unsigned char pred, volatile void *data, unsigned size) {
unsigned char *bytes = (unsigned char *)data;
for (unsigned i = 0; i != size; ++i) {
bytes[i] = pred + i;
}
}
#define INIT(PRED, VAR) init(PRED, &VAR, sizeof(VAR))
void test89() {
__m128 id18875;
INIT(11, id18875);
printVect("id18875", id18875);
__m128 id18878;
INIT(252, id18878); // 'id18878[0]' Becomes 0xfffefdfc (a NaN).
__m128 id18879;
INIT(59, id18879);
id18879 = (__m128){0.1848f, 48.31274f, 12625.06543f, 3297809.75f};
printVect("id18879", id18879);
__m128 id18881 = (__m128){55917731840.f, 14590895194112.f,
3805913865519104.f, 992398991704457215.f};
for (unsigned char id18880_idx = 0; id18880_idx < 131; ++id18880_idx)
id18879 -= id18881;
__m128 id18877 = id18878; // 'id18878' was set to 0xfffefdfc (NaN) above.
printVect("Input id18877", id18877); // id18877[0] is 0xfffefdfc (NaN)
printVect("Input id18879", id18879); // id18879[0] is -7.325e+12
printf("id18877[0] += id18879[0];\n");
id18877[0] += id18879[0]; // id18877[0] = 0xfffefdfc + (-7.325e+12)
printVect("Updated id18877", id18877); // id18877[0] _should_ be the same NaN
printf("id18875 *= id18877;\n");
id18875 *= id18877;
printVect("Updated id18875", id18875);
}
int main() { test89(); }
cc @spatel-gh
@nunoplopes FYI, this might be in your current area of interest.
A specific NaN value is replaced with a canonical NaN value. We're not concerned about anything else in this report, are we?
I think this is the minimized IR that shows the problem:
define <2 x double> @f(<2 x double> %x) {
%a = fadd <2 x double> %x, <double 0xFFFFDFBF80000000, double poison>
%r = shufflevector <2 x double> %a, <2 x double> <double 0xFFFFDFBF80000000, double 0xdead000000000000>, <2 x i32> <i32 0, i32 3>
ret <2 x double> %r
}
% opt -p instcombine addnan.ll -S -debug Args: opt -p instcombine addnan.ll -S -debug
INSTCOMBINE ITERATION #1 on f ADD: ret <2 x double> %r ADD: %r = shufflevector <2 x double> %a, <2 x double> <double 0xFFFFDFBF80000000, double 0xDEAD000000000000>, <2 x i32> <i32 0, i32 3> ADD: %a = fadd <2 x double> %x, <double 0xFFFFDFBF80000000, double poison> IC: Visiting: %a = fadd <2 x double> %x, <double 0xFFFFDFBF80000000, double poison> IC: Replacing %a = fadd <2 x double> %x, <double 0xFFFFDFBF80000000, double poison> with <2 x double> <double 0x7FF8000000000000, double 0x7FF8000000000000>
define <2 x double> @f(<2 x double> %x) { ret <2 x double> <double 0x7FF8000000000000, double 0xDEAD000000000000> }
Backtrack on the path that led to that behavior (in InstSimplify): https://reviews.llvm.org/D44521 https://lists.llvm.org/pipermail/llvm-dev/2018-March/121481.html
That seems to be before we introduced "poison" in IR. Now that we have it, we can probably sidestep the issue in this example at least.
But also note that there is no requirement that we preserve a NaN payload in IEEE-754, so technically, there is no bug here. Ie, we give our best effort to not change NaN bits, but users/Alive2 can't rely on that behavior.
The output difference should be gone again after: 9055661b958753d ...so I'll close this report. But as I wrote earlier, you may want to adjust test expectations for FP math with NaN values.
If I missed something, please re-open.
Thanks for fixing this so quickly @rotateright.
Closing the loop on a couple of comments/questions above:
Regarding:
A specific NaN value is replaced with a canonical NaN value. We're not concerned about anything else in this report, are we?
Yes, it wasn't obvious at the start, but after analysis, that's what this ultimately came down to.
Regarding:
But also note that there is no requirement that we preserve a NaN payload in IEEE-754, so technically, there is no bug here. Ie, we give our best effort to not change NaN bits, but users/Alive2 can't rely on that behavior.
True that it isn't a requirement to preserve the payload, and so it's fair to say this is not a bug. But the IEEE standard (IEEE Std 754™-2008) in section 6.2.3 NaN propagation, starts by saying:
An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.
Since it says that it should produce a NaN with the same payload, rather than must produce that NaN, then I agree that it's technically not a bug. And more directly, code shouldn't rely on it. So that argues that technically the test-case was "buggy". But that said, producing the same payload is certainly better than changing it.
Thanks for summarizing, @wjristow !
I'd just add that LLVM default FP is not fully IEEE-754 compliant; we have "-ffp-exception-behavior" and other flags that try to model that environment. By default, we're trying to balance compliance and perf. And as you already know, we can tilt much more towards perf with fast-math-flags.
@dyung - for test programs that do FP math, it might be more valuable to init with FP numbers instead of semi-arbitrary byte values. That way, you can verify that a valid known result is produced all the way through codegen.
This was an easy fix, so no real controversy. But you might be interested in following progress on issue #59279 / https://reviews.llvm.org/D139785 - we're forced to make a trade-off on that one to reduce optimization power to maintain some kind of semantic soundness.
Thanks @rotateright for pointing me at that other (more interesting, from a performance perspective) issue! And yes, control over the FP behavior in LLVM has improved dramatically over the last few years (e.g., with "-ffp-exception-behavior").
@dyung - for test programs that do FP math, it might be more valuable to init with FP numbers instead of semi-arbitrary byte values. That way, you can verify that a valid known result is produced all the way through codegen.
This is my fault. It's a test generator I wrote a long time ago that has caught a pretty large number of (legit) bugs over the years. I think this is the first case of a potentially not-a-bug in a while. Coincidentally, we're just starting work on a new and much more advanced version so I will make sure to bake some better protections against just this sort of issue into the design. Thanks!
We have an internal test that recently started to fail which I bisected back to 8adfa29706e5407b62a4726e2172894e0dfdc1e8.
I was able to reduce the test a little bit to the following:
When the above code is compiled with optimizations targeting btver2 (
-O2 -march=btver2
), it generates a different value after 8adfa29706e5407b62a4726e2172894e0dfdc1e8:Here is a link to godbolt showing the output difference between trunk and LLVM 15: https://godbolt.org/z/E63hbbTfs