[TBAA] "long long" type causes incorrect optimization in GVN

huhu233 commented 3 months ago

There is a simple case, I get total different results with long type and long long type, as shown,

long type

#include <cstdio>
#include <arm_sve.h>
const int SIZE = 16;
int main()
{
    int datas[SIZE];
    #pragma clang loop vectorize(disable)
    for (int i = 0; i < SIZE; i++) {
        datas[i] = -i - i - 1;
    }
    long res2[SIZE];
    svbool_t pa = svptrue_b32();
    svint32_t v1 = svld1(pa, &datas[0]);

    svint64_t v2 = svunpklo(v1);
    svst1(pa, (long *)&res2[0], v2);

    printf("--%dth-- %lld\n", 0, res2[0]);
    printf("--%dth-- %lld\n", 1, res2[1]);
    printf("\n");
    return 0;
}

output run with opensource qemu

--0th-- -1
--1th-- -3

#include <cstdio>
#include <arm_sve.h>
const int SIZE = 16;
int main()
{
    int datas[SIZE];
    #pragma clang loop vectorize(disable)
    for (int i = 0; i < SIZE; i++) {
        datas[i] = -i - i - 1;
    }
    long long res2[SIZE];
    svbool_t pa = svptrue_b32();
    svint32_t v1 = svld1(pa, &datas[0]);

    svint64_t v2 = svunpklo(v1);
    svst1(pa, (long *)&res2[0], v2);

    printf("--%dth-- %lld\n", 0, res2[0]);
    printf("--%dth-- %lld\n", 1, res2[1]);
    printf("\n");
    return 0;
}

output run with opensource qemu

--0th-- -1
--1th-- -6484229677715469824

I did some preliminary analysis and found that the main difference was in the tbaa results。IR diverged after GVN pass and different tbaa caused different results, long type https://godbolt.org/z/3sraTe9bE

long long type https://godbolt.org/z/sYsrna6ar

dtcxzyw commented 3 months ago

cc @nikic @fhahn

nikic commented 3 months ago

If there is a problem here, then it is a clang frontend problem. The svst1 is emitted with TBAA metadata for "long", while your other operations use "long long", which is a strict aliasing violation.

Unless svst1 is explicitly intended to not participate in strict aliasing (and should use omnipotent char TBAA), then the original code is UB. I can't say what the intended semantics for these intrinsics are, but given the long * argument, it seems likely to me that participation in strict aliasing is intentional.

huhu233 commented 3 months ago

If there is a problem here, then it is a clang frontend problem. The svst1 is emitted with TBAA metadata for "long", while your other operations use "long long", which is a strict aliasing violation.

Unless svst1 is explicitly intended to not participate in strict aliasing (and should use omnipotent char TBAA), then the original code is UB. I can't say what the intended semantics for these intrinsics are, but given the long * argument, it seems likely to me that participation in strict aliasing is intentional.

Hi, @nikic , thanks for your comment! Yor are right, semantic ambiguity does exist. Based on tbaa, there is no aliasing between %23 and %2 (although they represent the same pointer from context), as pointers to different "types" don't alias,

  store <vscale x 2 x i64> %20, ptr %2, align 16, !tbaa !9
  %21 = load i64, ptr %2, align 16, !tbaa !11
  %22 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef 0, i64 noundef %21)
  %23 = getelementptr inbounds [16 x i64], ptr %2, i64 0, i64 1
  %24 = load i64, ptr %23, align 8, !tbaa !11

The cause of this problem is a mixture of intrinsics and high-level language with implicit type conversions，but from a grammatical point of view, this usage is not wrong. One solution is not emit tbaa or emit omnipotent char tbaa as you said, and let BasicAA do further analysis, but it may lose some optimization opportunities, because accurate tbaa is helpful for some optimizations. I'll try to find a reasonable solution~

llvmbot commented 2 months ago

@llvm/issue-subscribers-clang-frontend

Author: None (huhu233)

There is a simple case, I get total different results with `long` type and `long long` type, as shown, **long type** ``` #include <cstdio> #include <arm_sve.h> const int SIZE = 16; int main() { int datas[SIZE]; #pragma clang loop vectorize(disable) for (int i = 0; i < SIZE; i++) { datas[i] = -i - i - 1; } long res2[SIZE]; svbool_t pa = svptrue_b32(); svint32_t v1 = svld1(pa, &datas[0]); svint64_t v2 = svunpklo(v1); svst1(pa, (long *)&res2[0], v2); printf("--%dth-- %lld\n", 0, res2[0]); printf("--%dth-- %lld\n", 1, res2[1]); printf("\n"); return 0; } ``` **output** run with opensource `qemu` ``` --0th-- -1 --1th-- -3 ``` ``` #include <cstdio> #include <arm_sve.h> const int SIZE = 16; int main() { int datas[SIZE]; #pragma clang loop vectorize(disable) for (int i = 0; i < SIZE; i++) { datas[i] = -i - i - 1; } long long res2[SIZE]; svbool_t pa = svptrue_b32(); svint32_t v1 = svld1(pa, &datas[0]); svint64_t v2 = svunpklo(v1); svst1(pa, (long *)&res2[0], v2); printf("--%dth-- %lld\n", 0, res2[0]); printf("--%dth-- %lld\n", 1, res2[1]); printf("\n"); return 0; } ``` **output** run with opensource `qemu` ``` --0th-- -1 --1th-- -6484229677715469824 ``` I did some preliminary analysis and found that the main difference was in the tbaa results。IR diverged after GVN pass and different tbaa caused different results, **long type** https://godbolt.org/z/3sraTe9bE **long long type** https://godbolt.org/z/sYsrna6ar

llvm / llvm-project

[TBAA] "long long" type causes incorrect optimization in GVN #97783