Strict-aliasing not noticing valid aliasing of two unions with active members

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.25k stars 11.67k forks source link

Strict-aliasing not noticing valid aliasing of two unions with active members #33980

Open ba445bc0-1011-4361-8560-c5d362a94dfe opened 7 years ago

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago


Bugzilla Link	34632
Version	trunk
OS	Linux
CC	@comex,@davidstone,@DougGregor,@dushistov,@gburgessiv,@hfinkel,@nunoplopes,@regehr,@zygoloid,@viccpp

Extended Description

Consider the following C/C++ code with -O3 -fstrict-aliasing:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}

int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
  if (read_s1x(&p1->v1))
  {
    unsigned short temp;
    temp = p3->v1.x;
    p3->v2.x = temp;
    write_s2x(&p2->v2,1234);
    temp = p3->v2.x;
    p3->v1.x = temp;
  }
  return read_s1x(&p1->v1);
}
int test2(int x)
{
  union s1s2 q[2];
  q->v1.x = 4321;
  return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
  printf("%d\n",test2(0));
}

Clang (and GCC) generate code that outputs 4321 instead of the expected 1234.

I don't really understand things in terms of the C standard, but in terms of the C++ standard, it seems as if Clang and GCC are incorrect, and this code is well-defined. (The output is 4321 in both C and C++ mode.)

According to [class.union]/5 in the C++17 draft N4659, the assignment expression "p3->v2.x = temp;" changes the active member of the union. It's done through a union member access expression. Thus the pointer &p2->v2 is valid here.

Even if I switch this to "p3->v2 = { x };", avoiding the nested case, the problem still happens.

Even if I explicitly change the active member of the union with placement new as "new(&p3.v2) s2;", the problem still happens.

Is it possible that Clang doesn't see the possibility that p2 and p3 point to the same object?

ba445bc0-1011-4361-8560-c5d362a94dfe commented 4 years ago

This also occurs with std::variant in Clang++. Something about libstdc++'s std::variant jogs GCC's memory, though; it works in GCC.

// Tested in whatever "x86-64 clang (trunk)" is on Godbolt.
//clang++ -std=gnu++17 -O3 -fstrict-aliasing meow.cpp -o meow

#include <variant>
#include <cstdio>

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
using s1s2 = std::variant<s1, s2>;

static int read_s1x(s1 *p) { return p->x; }
static void write_s2x(s2 *p, int v) { p->x=v; }

int test(s1s2 *p1, s1s2 *p2, s1s2 *p3)
{
  if (read_s1x(std::get_if<s1>(p1)))
  {
    unsigned short temp;
    temp = std::get_if<s1>(p3)->x;
    p3->emplace<s2>(s2{temp});
    write_s2x(std::get_if<s2>(p2),1234);
    temp = std::get_if<s2>(p3)->x;
    p3->emplace<s1>(s1{temp});
  }
  return read_s1x(std::get_if<s1>(p1));
}
int test2(int x)
{
  s1s2 q[2];
  q->emplace<s1>(s1{4321});
  return test(q,q+x,q+x);
}
int main(void)
{
  std::printf("%d\n",test2(0));
}

hfinkel commented 6 years ago

You mean I'm not supposed to have all of my types privately inherit from std::monostate twice to make them magically go faster?

In all seriousness, do we have a write up somewhere of what "obviously through a union access" means? I have written a variant type that I have used to implement the equivalent to std::variant as well as the small string optimization, and it could also be used (I believe) as a zero-overhead implementation for all of the other variant types in LLVM's code, such as "variant of very small integer and pointer". Naturally, this requires access to go through a function to normalize these different use cases.

Looking at my code, even the version of my SSO implementation that uses a union directly, rather than going through my variant, has "obviously through a union" access, but the thing they are accessing is a member function which just immediately returns a member variable (a private bitfield data member), where the member function just exists to ensure access to the member variable is always const. Is this not considered obvious? If I have to make my common initial subsequence data members public because I cannot access them through a trivial inline member function, I have to make all of my data members of this type public or I lose my standard-layout property and thus cannot use common initial subsequences.

Can you please write some code here to demonstrate what you're saying?

davidstone commented 6 years ago

You mean I'm not supposed to have all of my types privately inherit from std::monostate twice to make them magically go faster?

In all seriousness, do we have a write up somewhere of what "obviously through a union access" means? I have written a variant type that I have used to implement the equivalent to std::variant as well as the small string optimization, and it could also be used (I believe) as a zero-overhead implementation for all of the other variant types in LLVM's code, such as "variant of very small integer and pointer". Naturally, this requires access to go through a function to normalize these different use cases.

Looking at my code, even the version of my SSO implementation that uses a union directly, rather than going through my variant, has "obviously through a union" access, but the thing they are accessing is a member function which just immediately returns a member variable (a private bitfield data member), where the member function just exists to ensure access to the member variable is always const. Is this not considered obvious? If I have to make my common initial subsequence data members public because I cannot access them through a trivial inline member function, I have to make all of my data members of this type public or I lose my standard-layout property and thus cannot use common initial subsequences.

llvmbot commented 6 years ago

So, as a preliminary, i can break every test case gcc has fixed, because they are pretty simple workarounds that will still break. Richard knows this as well :)

I don't think it's worth discussing without technical details. This is pretty passive aggressive, and really does not make me want to engage in a discussion with you.

However, TL;DR, if your memory model is that types can be changed at any time for allocated memory, in ways that are invisible to other translation units, TBAA is impossible to apply to anything by default without breaking some cases.

If it can be done for constant propagation etc. why it cannot be done for TBAA?

I literally just explained it in the piece you erased from the quote. You have raised no specific concerns with my explanation.

I don't think such an interpretation (or standard) makes a lot of sense. If that's what C/C++ really want, they should remove effective type rules entirely, as they just create a mess.

Changes of effective type in C are described in C11, 6.5p6. C++ introduced whole machinery of placement new which allows to change dynamic type much more widely. So the progress seems to go in exactly the opposite direction.

Things change of course. In any case, i just don't have the energy anymore to endlessly argue about this. You are welcome to fix these things if you like. I'm not going to spend my energy trying anymore.

llvmbot commented 6 years ago

I'd like to point out that your test case isn't quite the same thing as the original: your test case takes pointers to inactive members of unions. The original code never does this--pointers are only ever taken to the active member.

Why this difference is important? Memory is allocated for these objects so there should be no problem with taking their addresses?

llvmbot commented 6 years ago

So, as a preliminary, i can break every test case gcc has fixed, because they are pretty simple workarounds that will still break. Richard knows this as well :)

I don't think it's worth discussing without technical details.

However, TL;DR, if your memory model is that types can be changed at any time for allocated memory, in ways that are invisible to other translation units, TBAA is impossible to apply to anything by default without breaking some cases.

If it can be done for constant propagation etc. why it cannot be done for TBAA?

I don't think such an interpretation (or standard) makes a lot of sense. If that's what C/C++ really want, they should remove effective type rules entirely, as they just create a mess.

Changes of effective type in C are described in C11, 6.5p6. C++ introduced whole machinery of placement new which allows to change dynamic type much more widely. So the progress seems to go in exactly the opposite direction.

ba445bc0-1011-4361-8560-c5d362a94dfe commented 6 years ago

Regarding the original issue, here are simplified testcases. With a union (C and C++):

#include <stdio.h>

union u {
  long x;
  long long y;
};

static long test(long *px, long long *py, union u *pu)
{
  pu->x = 0;            // make .x active member (for C++)
  *px = 1;              // access .x via pointer

  pu->y = pu->x;        // switch active member to .y (for C++)
  *py = 0;              // access .y via pointer

  pu->x = pu->y;        // switch active member back to .x
  return *px;           // access .x via pointer
}

int main(void)
{
  union u u;

  printf("%ld\n", test(&u.x, &u.y, &u));
}

I'd like to point out that your test case isn't quite the same thing as the original: your test case takes pointers to inactive members of unions. The original code never does this--pointers are only ever taken to the active member.

Clang (and GCC) may be reducing the original code to this, but semantically, the two test cases are different.

Melissa

llvmbot commented 6 years ago

So, as a preliminary, i can break every test case gcc has fixed, because they are pretty simple workarounds that will still break. Richard knows this as well :)

Instead of responding to your specific test cases, i'm going to give you two general rules of interpretation all the compilers i know of follow:

It must be possible to determine the effect of TBAA using only the information available in the current translation unit
Adding more translation units to the scope of what the compiler sees should not change the result.

Compilers consider anything that violate this to be a bug in the standard or a bug in interpretation.

Anything else creates either a terrible user experience (things work depending on how many files they are split across or how they are refactored), would not allow the use of TBAA in general, or requires additional, non-TBAA info to get anywhere by default (Your malloc example would require IPA points-to to determine whether a thing came from a malloc or not, for example).

Rule #1 is in fact, the reason for the "visible union" rule that compilers follow - so that the result does not change when you split it into multiple files at function boundaries.

Your malloc example, in particular, would run afoul of this if you split it into multiple files at the function boundary - it would not be possible for a compiler compiling the function, alone, to know that it should do something special there.

Richard's hack in gcc will make some of these work, but make others fail, actually. GCC tries to have a middle end memory model that allows changing the effective type of memory, but it only works per-function.

However, TL;DR, if your memory model is that types can be changed at any time for allocated memory, in ways that are invisible to other translation units, TBAA is impossible to apply to anything by default without breaking some cases.

I don't think such an interpretation (or standard) makes a lot of sense. If that's what C/C++ really want, they should remove effective type rules entirely, as they just create a mess.

llvmbot commented 6 years ago

Taking the address of union fields, passing them to a function, and storing through the pointers, simply cannot be made to work right in any optimizing compiler, without giving up on just about all alias and pointer analysis.
The compiler simply can't ever know, for sure, that the incoming pointers are part of a union.

I think unions here and in various other discussions are a red herring. Many problems are literally translated from unions to allocated memory. This is perfectly demonstrated by this bug and, e.g., by bug 21725. The core issue is that you can change effective/dynamic type of a region of memory and the main difference is that allocated memory doesn't restrict possible types while a union limits the choice by a list of types used in its declaration.

The situation with allocated memory is easier than with unions (no need to consider cases of accesses "obviously" through a union and through pointers separately), more fundamental. And it's not going away if you change the C++ standard as clang claims to support C (with optimizations, I assume). If you fix the case of allocated memory you get unions for free.

Whether it's feasible to deal with allocated memory in all generality while retaining high optimization level, I would say "yes" -- gcc seems to fare quite well. And bugs reported by me in this area were fixed promptly, e.g.:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69776 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70484

BTW the former one affects clang too -- bug 26603.

As you can see, you get the opposite result with gcc:

Sorry I've failed to copy-paste testcases properly. I've just posted the corrected ones.

llvmbot commented 6 years ago

Oops, zeroes and ones were mixed in testcases in comment 24. Sorry for the confusion.

Corrected testcase with a union (C and C++):

#include <stdio.h>

union u {
  long x;
  long long y;
};

static long test(long *px, long long *py, union u *pu)
{
  pu->x = 0;            // make .x active member (for C++)
  *px = 0;              // access .x via pointer

  pu->y = pu->x;        // switch active member to .y (for C++)
  *py = 1;              // access .y via pointer

  pu->x = pu->y;        // switch active member back to .x
  return *px;           // access .x via pointer
}

int main(void)
{
  union u u;

  printf("%ld\n", test(&u.x, &u.y, &u));
}

Results:

$ clang -std=c11 -Weverything test.c && ./a.out
1
$ clang -std=c11 -Weverything -O3 test.c && ./a.out
0

And with allocated memory (C; add placement new's for C++):

#include <stdlib.h>
#include <stdio.h>

static long test(long *px, long long *py, void *pu)
{
  *px = 0;
  *py = 1;

  *(long *)pu = *(long long *)pu;

  return *px;
}

int main(void)
{
  void *p = malloc(10);

  printf("%ld\n", test(p, p, p));
}

Results:

$ clang -std=c11 -Weverything test.c && ./a.out
1
$ clang -std=c11 -Weverything -O3 test.c && ./a.out
0

clang version: clang version 6.0.0 (https://llvm.org/git/clang.git ef34bcb258ffbf7a500bb715edced98fe6348676) (https://llvm.org/git/llvm.git d35a2569ffcc3dc6802824b4dc35eaa0202ae60e)

llvmbot commented 6 years ago

As for the "common initial sequence" issue, it seems type punning issues (bug 11966, bug 28241) are fixed in clang 5 and the situation with the "common initial sequence" issue becomes more manageable now. (I'm told that type punning is not really fixed, just temporarily "pessimized". But the intention is to really fix it, right?)

Perhaps it's worth to get back to the text of C89. It clearly says that the case of common initial sequence (which is fully defined) is an exception to type punning (which is implementation-defined). So it would be reasonable to deal with common initial sequences only in cases wher type punning is supported, i.e. if type punning is supported only when the access is "obviously" through a union then it's somewhat strange to try to support common initial sequences when the access is via pointers. (And the restriction that type punning is supported only when the access is "obviously" through a union is derived from the fact that it's described in clause dedicated to the . operator and the -> operator.) Then you don't have to support common initial sequences in any way other than making sure that layout of structures is the same. AIUI this is the position of gcc devs (that's what I get from reading https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892 .)

An alternative way to lower the damage from common initial sequence rule to optimization is to limit its support only to places where the union is "visible" as described in C99/C11.

As the question is open in C community too (e.g., http://open-std.org/jtc1/sc22/wg14/www/docs/dr_257.htm is not resolved) my naive view is it's better to coordinate it with them (standards harmonization etc.)

llvmbot commented 6 years ago

Regarding the original issue, here are simplified testcases. With a union (C and C++):

#include <stdio.h>

union u {
  long x;
  long long y;
};

static long test(long *px, long long *py, union u *pu)
{
  pu->x = 0;            // make .x active member (for C++)
  *px = 1;              // access .x via pointer

  pu->y = pu->x;        // switch active member to .y (for C++)
  *py = 0;              // access .y via pointer

  pu->x = pu->y;        // switch active member back to .x
  return *px;           // access .x via pointer
}

int main(void)
{
  union u u;

  printf("%ld\n", test(&u.x, &u.y, &u));
}

Taking the address of union fields, passing them to a function, and storing through the pointers, simply cannot be made to work right in any optimizing compiler, without giving up on just about all alias and pointer analysis. The compiler simply can't ever know, for sure, that the incoming pointers are part of a union.

(For example, put "test" in a different translation unit, and now you have no reason to believe the long and longlong pointers could ever be part of a union)

As you can see, you get the opposite result with gcc:

╰─ gcc-7 foo.c  -O2
╰─ ./a.out
1
╰─ gcc-7 foo.c 
╰─ ./a.out 
0
╰─ gcc-7 foo.c  -O2 -fno-strict-aliasing
╰─ ./a.out
0

So either you assume all arguments may alias regardless of anything else (which, when propagated, means you don't perform any real alias analysis), or, as mentioned, you require all read/write of a union have a visible union access.

All compilers i'm aware of choose the latter.

llvmbot commented 6 years ago

Regarding the original issue, here are simplified testcases. With a union (C and C++):

#include <stdio.h>

union u {
  long x;
  long long y;
};

static long test(long *px, long long *py, union u *pu)
{
  pu->x = 0;            // make .x active member (for C++)
  *px = 1;              // access .x via pointer

  pu->y = pu->x;        // switch active member to .y (for C++)
  *py = 0;              // access .y via pointer

  pu->x = pu->y;        // switch active member back to .x
  return *px;           // access .x via pointer
}

int main(void)
{
  union u u;

  printf("%ld\n", test(&u.x, &u.y, &u));
}

Results:

$ clang -std=c11 -Weverything test.c && ./a.out
1
$ clang -std=c11 -Weverything -O3 test.c && ./a.out
0

And with allocated memory (C; add placement new's for C++):

#include <stdlib.h>
#include <stdio.h>

static long test(long *px, long long *py, void *pu)
{
  *px = 1;
  *py = 0;

  *(long *)pu = *(long long *)pu;

  return *px;
}

int main(void)
{
  void *p = malloc(10);

  printf("%ld\n", test(p, p, p));
}

Results:

$ clang -std=c11 -Weverything test.c && ./a.out
1
$ clang -std=c11 -Weverything -O3 test.c && ./a.out
0

clang version: clang version 6.0.0 (https://llvm.org/git/clang.git f027325999d75572cbdb4dda2e475bd27bcc74da) (https://llvm.org/git/llvm.git f7bb38ca4e1c9eb290f7c4c9dbfcb9593f1b140f)

ba445bc0-1011-4361-8560-c5d362a94dfe commented 6 years ago

The problem here is that it is a read of an inactive union member within a common initial sequence through a pointer without using a union member access expression, just like what read_s1x() does in Richard's example.

Yes, we know :-( -- That's the use case.

In the end, however, we may have little choice. In practice, these APIs are already broken. There have certainly been cases where language standards have changed in ways that broke existing POSIX APIs (e.g., makecontext).

Unlike for makecontext, however, these APIs are way too common, IMHO, to just change or remove. One potential solution: We might add an attribute, or similar, to some part of the declaration that will allow these APIs to continue to function.

It's not just POSIX, either.

Linux: Enumerating adapters via PF_NETLINK.

Mac OS: load_command[_64] from the Mach-O file format.

Win32: Pretty much any structure where an initial length field is used to distinguish which version of the structure is passed. There are hundreds. Simple example: GetVersionExW

There are a huge number to go through.

hfinkel commented 6 years ago

Changing the standard to require an obvious union access in the common-initial-sequence case may be more likely than in the original test case in this bug report.

That has an issue, though: struct sockaddr.

union { struct sockaddr m_sa; struct sockaddr_in6 m_in6; } address;

address.m_in6.sin6_family = AF_INET6; address.m_in6.sin6_port = 1234; ... connect(sock, &address.m_sa, sizeof(address.m_in6));

connect hypothetically could have code like this:

if (addr->sa_family == AF_INET6) ...

The problem here is that it is a read of an inactive union member within a common initial sequence through a pointer without using a union member access expression, just like what read_s1x() does in Richard's example.

Yes, we know :-( -- That's the use case.

In the end, however, we may have little choice. In practice, these APIs are already broken. There have certainly been cases where language standards have changed in ways that broke existing POSIX APIs (e.g., makecontext).

Unlike for makecontext, however, these APIs are way too common, IMHO, to just change or remove. One potential solution: We might add an attribute, or similar, to some part of the declaration that will allow these APIs to continue to function.

ba445bc0-1011-4361-8560-c5d362a94dfe commented 6 years ago

Changing the standard to require an obvious union access in the common-initial-sequence case may be more likely than in the original test case in this bug report.

That has an issue, though: struct sockaddr.

union
{
    struct sockaddr m_sa;
    struct sockaddr_in6 m_in6;
} address;

address.m_in6.sin6_family = AF_INET6;
address.m_in6.sin6_port = 1234;
...
connect(sock, &address.m_sa, sizeof(address.m_in6));

connect hypothetically could have code like this:

if (addr->sa_family == AF_INET6) ...

The problem here is that it is a read of an inactive union member within a common initial sequence through a pointer without using a union member access expression, just like what read_s1x() does in Richard's example.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

Ah, it's that hole in the C object model. Here's a reduced testcase that doesn't involve unions:

typedef unsigned long s1; // uintptr_t
typedef int *s2;

static int read_s1x(s1 *p) { return *p; }
static void write_s2x(s2 *p, s2 v) { *p = v;}

int test(void *p1, void *p2, void *p3)
{
  if (read_s1x((s1*)p1))
  {
    s1 temp;
    temp = *(s1*)p3;
    *(s2*)p3 = (s2)temp;
    write_s2x((s2*)p2,(s2)1234);
    temp = (s1)*(s2*)p3;
    *(s1*)p3 = temp;
  }
  return read_s1x((s1*)p1);
}
int test2(int x)
{
  s1 *q = (s1*)malloc(sizeof(s1));
  *q = 4321;
  return test(q,q+x,q+x);
}

In C, LLVM miscompiles test2 to return 4321, for the same reason it miscompiles the case in comment#0: the load/bitcast/store pairs via p3 are deleted, leaving a store of an s2 which LLVM believes can't affect a load of an s1.

Fundamentally, the problem is that object lifetime events in C are implicit and can be erased by the optimizer, leaving no hint that the dynamic type of storage has changed (unless you want to model all stores as potential lifetime events, which means it is not correct to fully remove a load/store same value pair in general, unless you also "fix" the TBAA metadata elsewhere to compensate).

When I was reworking the C++ object model for C++17, I tried to ensure that there would be a syntactic marker for all lifetime events, precisely to avoid this problem and those like it. The above example is invalid in C++, because it lacks the requisite placement new's to change the dynamic type of the object. (That said, adding them makes no difference today, because they also get fully erased before the relevant IR optimization applies: https://godbolt.org/g/aG5QHL.)

The example in comment#0 is valid in C++, however, because the syntax "p3->v2.x = temp" is specified to potentially result in a change in the type of the dynamic storage containing *p3. An optimization that completely deletes that assignment is wrong under the C++ object model. We could emit markers from the frontend to indicate when the storage type of an object could have changed, at least in C++.

hfinkel commented 6 years ago

Yes, I think that this is exactly how we'd need to implement this. We'd need to do it for all standard-layout structures, however, because we have no way of knowing whether an appropriate union might exist somewhere. This has the obvious unfortunate side effect of not being able to use TBAA field analysis to differentiate initial fields very well.

I suppose we'd need to tell users to put commonly-accessed fields toward the end of their structures - the aliasing analysis will work better - and then we can all sigh.

It seems that the Standard should be changed to say that the common initial sequence rule can only be used if a union member access expression is used. Then Richard's example is undefined behavior, because a union member access is not involved in the actual read in read_s1x.

However, what will be done about the original case?

The problem here is that in this part of the code:

  if (read_s1x(&p1->v1))
  {
    unsigned short temp;
    temp = p3->v1.x;
    p3->v2.x = temp;
    write_s2x(&p2->v2,1234);
    temp = p3->v2.x;
    p3->v1.x = temp;
  }
  return read_s1x(&p1->v1);

First, we figure out that

    temp = p3->v1.x;
    p3->v2.x = temp;

and

    temp = p3->v2.x;
    p3->v1.x = temp;

are noops and can be removed. We can do this pretty early, because all of these access are done through p3 (i.e., we don't need to wait until after inlining or similar). Once we do that, however, we're left with no indication that the active member of the union might have changed. Currently, the only way we might catch that is that we declare all access through unions as being universally aliasing. This is an over approximation, but even that does not help because all of these access go away early in the optimization process. This leaves us only with the inlined access from write_s2x and read_s1x. Based on their TBAA metadata (which knows nothing about the union), these don't alias. At that point, EarlyCSE helpfully removed the second load in favor of the first. Thus the bug.

In order to fix this, we might introduce some kind of marker of where the active member of a union might change. Having it appear to write to the union would work, but if we don't want to unnecessarily block optimization, we need to think more about the semantics. The marker is essentially a kind of type-metadata-translation point (making TBAA metadata from before the barrier look different to accesses after the barrier). I'm not sure how to implement this efficiently outside of MemorySSA, however.

The original case seems like it should be supported, but both GCC and Clang mess it up.

Melissa

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

Yes, I think that this is exactly how we'd need to implement this. We'd need to do it for all standard-layout structures, however, because we have no way of knowing whether an appropriate union might exist somewhere. This has the obvious unfortunate side effect of not being able to use TBAA field analysis to differentiate initial fields very well.

I suppose we'd need to tell users to put commonly-accessed fields toward the end of their structures - the aliasing analysis will work better - and then we can all sigh.

It seems that the Standard should be changed to say that the common initial sequence rule can only be used if a union member access expression is used. Then Richard's example is undefined behavior, because a union member access is not involved in the actual read in read_s1x.

However, what will be done about the original case? The original case seems like it should be supported, but both GCC and Clang mess it up.

Melissa

hfinkel commented 7 years ago

While it's true you are talking about a different rule that causes a subset of these cases, doing anything but 1 also destroys literally all pointer analysis for structs and fields :)

Is that really true? If we consider a standard-layout struct like:

struct A { int x; char y; double z; };

to "really" be

struct _int { int x; }; struct _int_char : _int { char y; }; struct _int_char_double : _int_char { double z; }; struct A : _int_char_double { };

... for the purposes of TBAA, and consider accesses of, say, y, to use _int_char as the base type rather than A,

Yes, I think that this is exactly how we'd need to implement this. We'd need to do it for all standard-layout structures, however, because we have no way of knowing whether an appropriate union might exist somewhere. This has the obvious unfortunate side effect of not being able to use TBAA field analysis to differentiate initial fields very well.

I suppose we'd need to tell users to put commonly-accessed fields toward the end of their structures - the aliasing analysis will work better - and then we can all sigh.

then I think we would get conservatively-correct aliasing results without losing all field-sensitive AA opportunities. If you know that the A object is a subobject of something else, you get to use "y in A" TBAA metadata for more accurate alias analysis results.

(In effect: use structural typing of the relevant struct prefix for standard-layout accesses, use nominal typing everywhere else, and still get field-sensitive AA for all accesses.)

That said...

I'd also, as a user, be opposed to rules that aren't very very bright. Saying "all accesses you want to alias in a union must start with a union access" is very bright.

I agree with this wholeheartedly. The common initial sequence rule is very widely misunderstood, and fails to actually address the root problem: people want a way to directly express that they are interpreting a value of one type as another type.

(we could also introduce -funions-alias-structs or something if we really cared)

I think that would make sense, and would incidentally let us measure the cost of an approach like the one I describe above.

Indeed.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

While it's true you are talking about a different rule that causes a subset of these cases, doing anything but 1 also destroys literally all pointer analysis for structs and fields :)

Is that really true? If we consider a standard-layout struct like:

struct A {
  int x;
  char y;
  double z;
};

to "really" be

struct _int {
  int x;
};
struct _int_char : _int {
  char y;
};
struct _int_char_double : _int_char {
  double z;
};
struct A : _int_char_double {
};

... for the purposes of TBAA, and consider accesses of, say, y, to use _int_char as the base type rather than A, then I think we would get conservatively-correct aliasing results without losing all field-sensitive AA opportunities. If you know that the A object is a subobject of something else, you get to use "y in A" TBAA metadata for more accurate alias analysis results.

(In effect: use structural typing of the relevant struct prefix for standard-layout accesses, use nominal typing everywhere else, and still get field-sensitive AA for all accesses.)

That said...

I'd also, as a user, be opposed to rules that aren't very very bright. Saying "all accesses you want to alias in a union must start with a union access" is very bright.

I agree with this wholeheartedly. The common initial sequence rule is very widely misunderstood, and fails to actually address the root problem: people want a way to directly express that they are interpreting a value of one type as another type.

(we could also introduce -funions-alias-structs or something if we really cared)

I think that would make sense, and would incidentally let us measure the cost of an approach like the one I describe above.

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

We need to be a bit careful now because it seems we've started talking about two different things in the same place. One thing is your original example. The other thing is Richard's example and the common-initial-sequence rule. I can definitely see us coming to different conclusions in these two cases. Changing the standard to require an obvious union access in the common-initial-sequence case may be more likely than in the original test case in this bug report.

Perhaps Richard's issue should be put into a secondary issue, with the assumption that it would be corrected in the Standard instead of in the compiler?

It seems like a reasonable rule that if you ever access the wrong member of a union (other than in a way that causes activation of another member), you have to do it through a union member access expression.

In fact, I personally believe that the common initial sequence rule ought to be discarded in favor of a rule that establishes the status quo among the major compilers: wrong-member union access is legal through a union member access expression, but strongly implementation-specific unless the offset and underlying type matches. (Possibly also require that both the active member and the wrong member being accessed are standard-layout.)

Melissa

llvmbot commented 7 years ago

Like Hal says, i think the first testcase is obvious enough that it should work. But GCC doesn't get it to work either.

I haven't analyzed why in detail (unlike richard's test case, which gets scalarized immediately, this one doesn't).

hfinkel commented 7 years ago

As Danny says, GCC also does not support this kind of standard-compliant code. I highly suspect that we'll have a conversation about this at the next C++ committee meeting. It might be better to change the standard here than to change the compiler(s).

What rule would you propose, then? In the original example, you have three pointers to the same union changing the active member of the union through a union member-access expression. Pointers are only ever taken or used from the active member.

We need to be a bit careful now because it seems we've started talking about two different things in the same place. One thing is your original example. The other thing is Richard's example and the common-initial-sequence rule. I can definitely see us coming to different conclusions in these two cases. Changing the standard to require an obvious union access in the common-initial-sequence case may be more likely than in the original test case in this bug report.

Would you simply say that you're never allowed to take a pointer to a union member and write to it, even if it's the active member? Even passing a member to memcpy, since that is taking a pointer to it? I don't think that that is going to fly with a lot of C++ programmers who use unions and placement new.

memcpy is different (and is already universally aliasing). I don't think memcpy (and friends) are a concern in this context.

Also, it would not surprise me if some implementations of std::variant fall victim to this. I can't get clang to compile on my machine to try it, and gcc.godbolt.org doesn't seem to have a working version of std::variant. (GCC with libstdc++ std::variant works correctly.)

Melissa

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

As Danny says, GCC also does not support this kind of standard-compliant code. I highly suspect that we'll have a conversation about this at the next C++ committee meeting. It might be better to change the standard here than to change the compiler(s).

What rule would you propose, then? In the original example, you have three pointers to the same union changing the active member of the union through a union member-access expression. Pointers are only ever taken or used from the active member.

Would you simply say that you're never allowed to take a pointer to a union member and write to it, even if it's the active member? Even passing a member to memcpy, since that is taking a pointer to it? I don't think that that is going to fly with a lot of C++ programmers who use unions and placement new.

Also, it would not surprise me if some implementations of std::variant fall victim to this. I can't get clang to compile on my machine to try it, and gcc.godbolt.org doesn't seem to have a working version of std::variant. (GCC with libstdc++ std::variant works correctly.)

Melissa

hfinkel commented 7 years ago

Not support this use case, but as in other related cases, only support aliasing of things in unions with accessed through the union.

Always consider accesses to a struct to potentially alias with another struct access where both accesses are part of a common initial sequence.

Same as (2), but excepting cases where no such union exists.

Unfortunately, (3) seems unprovable (especially in the presence of shared libraries, plugins, and the like). To comply to the current wording, I suspect we'd need to do (2). This probably has some unfortunate performance implications.

While we already don't support some allowed behavior for unions (e.g., type punning where the union access is not explicit), I'm inclined to fix this problem. For one thing, several standard APIs effectively assume the initial-common-sequence aliasing is allowed (e.g., POSIX sockaddr).

I think it's important that we don't do this for all structures, but only for standard-layout ones (to preserve as much performance as we can).

(1) would mean that Clang++ doesn't comply with the C++ Standard, because there is no undefined behavior in the original example (changing active member) or in your example (reading from common initial sequence).

Whether this would comply with the C Standard, I don't know, because it's a lot harder to understand than the wording in the C++ Standard.

(1) is the current rule for Clang and GCC for the language extension of allowing reading from an inactive member not in a common initial sequence, and that's fine; using a union member access expression for that is reasonable. But to say that Standard-compliant code doesn't work is something else entirely.

As Danny says, GCC also does not support this kind of standard-compliant code. I highly suspect that we'll have a conversation about this at the next C++ committee meeting. It might be better to change the standard here than to change the compiler(s).

Melissa

llvmbot commented 7 years ago

Not support this use case, but as in other related cases, only support aliasing of things in unions with accessed through the union.

Always consider accesses to a struct to potentially alias with another struct access where both accesses are part of a common initial sequence.

Same as (2), but excepting cases where no such union exists.

Unfortunately, (3) seems unprovable (especially in the presence of shared libraries, plugins, and the like). To comply to the current wording, I suspect we'd need to do (2). This probably has some unfortunate performance implications.

While we already don't support some allowed behavior for unions (e.g., type punning where the union access is not explicit), I'm inclined to fix this problem. For one thing, several standard APIs effectively assume the initial-common-sequence aliasing is allowed (e.g., POSIX sockaddr).

I think it's important that we don't do this for all structures, but only for standard-layout ones (to preserve as much performance as we can).

(1) would mean that Clang++ doesn't comply with the C++ Standard, because there is no undefined behavior in the original example (changing active member) or in your example (reading from common initial sequence).

Correct. GCC does not either. Most compilers do not in fact. For example, XLC has qstrict_induction (ie the default is non-conformant), etc

Whether this would comply with the C Standard, I don't know, because it's a lot harder to understand than the wording in the C++ Standard.

(1) is the current rule for Clang and GCC for the language extension of allowing reading from an inactive member not in a common initial sequence, and that's fine; using a union member access expression for that is reasonable. But to say that Standard-compliant code doesn't work is something else entirely.

It's really not. No compiler is a binary state about these things, and most, particularly around aliasing, don't do full compliance, because users would rather have the performance than the compliance (and you literally can't have both). The cereberus report goes into detail here, actually, about what the de-facto standard looks like.

There are also, despite most efforts, parts of the standard that are often impossible to do (ie they conflict with other requirements) or incredibly impractical to do. As a famous example, only one compiler ever made "export" fully work in C++, despite it being part of the standard. For aliasing, "restrict" was very very rarely implemented properly.

This kind of thing is used to inform future standards. This is why, over time, the aliasing language has gotten better (for example, restrict!).

If you looked, 10 years ago, compilers pretty much couldn't comply with reasonable interpretations of the standard and say, still do any alias analysis. The result was not "they did no alias analysis" it was "they did what the vast majority of users wanted and tried to provide flags to be more compliant"

This is because compilers do not exist to implement standards. Standards are not an end goal.

Doing things are users are the end goal for both standards and compilers.

Put another way: If GCC and LLVM ignore this rule and do what they want (but provide flags if you really want to have it work), and 99.99% of the compiler users don't complain or care, this also informs the standard about whether it's a worthwhile rule.

In fact, you have pretty much no other good way of figuring out whether things in the standard matter or need changing except by making tradeoffs and seeing how it affects users.

This is obviously a tradeoff (particularly if you are forcing users to either work around your choice), but IMHO, users tend to be pretty good at informing you whether you are making the right tradeoff.

(GCC avoided placement new aliasing correctness for a while, but it turned out to be engineering feasible, and user-desired, so now it is correct)

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

Not support this use case, but as in other related cases, only support aliasing of things in unions with accessed through the union.

Always consider accesses to a struct to potentially alias with another struct access where both accesses are part of a common initial sequence.

Same as (2), but excepting cases where no such union exists.

Unfortunately, (3) seems unprovable (especially in the presence of shared libraries, plugins, and the like). To comply to the current wording, I suspect we'd need to do (2). This probably has some unfortunate performance implications.

While we already don't support some allowed behavior for unions (e.g., type punning where the union access is not explicit), I'm inclined to fix this problem. For one thing, several standard APIs effectively assume the initial-common-sequence aliasing is allowed (e.g., POSIX sockaddr).

I think it's important that we don't do this for all structures, but only for standard-layout ones (to preserve as much performance as we can).

(1) would mean that Clang++ doesn't comply with the C++ Standard, because there is no undefined behavior in the original example (changing active member) or in your example (reading from common initial sequence).

Whether this would comply with the C Standard, I don't know, because it's a lot harder to understand than the wording in the C++ Standard.

(1) is the current rule for Clang and GCC for the language extension of allowing reading from an inactive member not in a common initial sequence, and that's fine; using a union member access expression for that is reasonable. But to say that Standard-compliant code doesn't work is something else entirely.

Melissa

llvmbot commented 7 years ago

I would vote for 1: GCC gave up on this and only guarantees it will work if the union access is clearly visible in all cases. It has been like this since time immemorial, because anything else means you can't determine aliasing for structs at all (You can always union them in ways to align them). While it's true you are talking about a different rule that causes a subset of these cases, doing anything but 1 also destroys literally all pointer analysis for structs and fields :)

GCC doesn't even bother trying to care, FWIW.

GCC 7 immediately scalarized it into: u$v2$x_3 = 4321; u$v2$x_12 = 1234;

(the _ are ssa version separators, just used so it can keep some semblance of the original variable name. These are are separate variables)

While obviously we don't always have to do the same as GCC, i'm not sure we would come to a different conclusion here, and i'd have a lot of trouble seeing how it would make sense.

I'd also, as a user, be opposed to rules that aren't very very bright. Saying "all accesses you want to alias in a union must start with a union access" is very bright.

(IE the function calls here clearly run afoul of that). Saying "except for standard layout structs" is much less bright.

If you follow the bright line rule, gcc and clang (ToT) get the right answer:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(union s1s2 *p) { return p->v1.x; }
static void write_s2x(union s1s2 *p, int v) { p->v2.x=v;}

int test(union s1s2 *p1, union s1s2 *p2)
{
  if (p1->v1.x)
  {
    write_s2x(p2,1234);
    return read_s1x(p1);
  }
  return 0;
}
int test2(int x)
{
  union s1s2 u = {.v2.x = 4321};
  return test(&u, &u);
}

╰─  gcc-7 -O3 -fstrict-aliasing foo.c foo2.c
╰─ ./a.out                                             
1234

╰─  clang -O3 -fstrict-aliasing foo.c foo2.c
╰─ ./a.out                                             
1234

(we could also introduce -funions-alias-structs or something if we really cared)

hfinkel commented 7 years ago

Unless we just recently broke something, as of r303851, we fixed this (by not producing struct-path TBAA for accesses through unions at all).

In this case the access is not "obviously" through a union.

Ah, indeed.

I suppose we have some options here. We can:

Not support this use case, but as in other related cases, only support aliasing of things in unions with accessed through the union.
Always consider accesses to a struct to potentially alias with another struct access where both accesses are part of a common initial sequence.
Same as (2), but excepting cases where no such union exists.

Unfortunately, (3) seems unprovable (especially in the presence of shared libraries, plugins, and the like). To comply to the current wording, I suspect we'd need to do (2). This probably has some unfortunate performance implications.

While we already don't support some allowed behavior for unions (e.g., type punning where the union access is not explicit), I'm inclined to fix this problem. For one thing, several standard APIs effectively assume the initial-common-sequence aliasing is allowed (e.g., POSIX sockaddr).

I think it's important that we don't do this for all structures, but only for standard-layout ones (to preserve as much performance as we can).

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

Unless we just recently broke something, as of r303851, we fixed this (by not producing struct-path TBAA for accesses through unions at all).

In this case the access is not "obviously" through a union.

hfinkel commented 7 years ago

Our TBAA metadata is pretty clearly wrong here. The s1 loads have tbaa !6 and !12, and the s2 store has !tbaa !9, where:
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{#4, !4, i64 0}
!9 = !{#10, !11, i64 0}
!10 = !{!"_ZTS2s2", !11, i64 0}
!11 = !{!"short", !4, i64 0}
!12 = !{#13, !11, i64 0}
!13 = !{!"_ZTS2s1", !11, i64 0}
That is: the plain load of the union member is an "omnipotent char" load. The load within read_s1x is a "s1::x" load. The store within write_s2x is a "s2::x" store.

There appears to be absolutely no allowance made for the common initial sequence rule. It looks like we need a different TBAA representation for accesses of standard-layout structs.

Unless we just recently broke something, as of r303851, we fixed this (by not producing struct-path TBAA for accesses through unions at all).

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

Our TBAA metadata is pretty clearly wrong here. The s1 loads have tbaa !6 and !12, and the s2 store has !tbaa !9, where:

!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{#4, !4, i64 0}
!9 = !{#10, !11, i64 0}
!10 = !{!"_ZTS2s2", !11, i64 0}
!11 = !{!"short", !4, i64 0}
!12 = !{#13, !11, i64 0}
!13 = !{!"_ZTS2s1", !11, i64 0}

That is: the plain load of the union member is an "omnipotent char" load. The load within read_s1x is a "s1::x" load. The store within write_s2x is a "s2::x" store.

There appears to be absolutely no allowance made for the common initial sequence rule. It looks like we need a different TBAA representation for accesses of standard-layout structs.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

Slightly simpler example:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};             
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(struct s1 *p) { return p->x; }   
static void write_s2x(struct s2 *p, int v) { p->x=v;}

int test(union s1s2 *p1, union s1s2 *p2)
{              
  if (p1->v1.x)
  {                         
    write_s2x(&p2->v2,1234); 
    return read_s1x(&p1->v1);
  }        
  return 0;
}               
int test2(int x)
{                               
  union s1s2 u = {.v2.x = 4321};
  return test(&u, &u);
}

Note that this never even changes the active union member (it's always v2); instead it relies on the "common initial sequence" rule for the two loads through 'v1.x'.

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

This originated from a Stack Overflow post "supercat" made (I'm the "Myria" there).

https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/

ba445bc0-1011-4361-8560-c5d362a94dfe commented 7 years ago

GCC cross-reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224