llvm libc (and musl) are incompatible with llvm memcpy assumptions

RalfJung commented 10 months ago

The libc llvm seems to declare its memcpy with the restrict qualifier, matching the signature in the C standard:

https://github.com/llvm/llvm-project/blob/cb112eb16cff222d8fbe7cfd3cb0834f538a691d/libc/src/string/memcpy.cpp#L15-L17

This means it will be compiled in a way such that when the two pointers are used to access the same memory, and at least one access is a write, there is UB. This is UB both in the C source semantics and in LLVM IR (via noalias).

However, LLVM itself assumes that the libc memcpy is always defined behavior when dst==src. (This comes up frequently on this issue tracker, e.g. at https://github.com/llvm/llvm-project/issues/60734, https://github.com/llvm/llvm-project/issues/55399. But as far as I know it is not mentioned in the LLVM documentation.)

Reading the llvm-libc sources, I did not find any guards that would short-circuit the function when dst==src. It seems very much like the usual copy loop will be executed, loading from src and storing to dst. This is UB due to the restrict keyword, making LLVM incompatible with its own libc even when said libc is compiled with LLVM -- unless I misunderstood something. (There are so many macros in the libc it's very hard to be sure what the actually compiled code will be.^^)

musl also uses restrict in its memcpy, so it is likewise incompatible with LLVM.

lntue commented 10 months ago

@gchatelet : Do you mind taking a look at this? Thanks!

SchrodingerZhu commented 10 months ago

POSIX and the C standards are explicit that employing memcpy() with overlapping areas produces undefined behavior.

Shouldn't this be UB by standards?

RalfJung commented 10 months ago

Yes it is, but when the compiler calls a particular memcpy implementation, that implementation could make more guarantees than what the standard prescribes. LLVM assumes this to be the case. However, llvm-libc does not seem to actually provide such guarantees due to the restrict (and it's not documented as a guarantee either AFAIK). See https://github.com/llvm/llvm-project/issues/12135 for some of the history here.

SchrodingerZhu commented 10 months ago

Ah, I see. llvm.memcpy behaves differently with __llvm_libc::memcpy. This may introduce complications.

llvmbot commented 10 months ago

@llvm/issue-subscribers-bug

Author: Ralf Jung (RalfJung)

The libc llvm seems to declare its `memcpy` with the `restrict` qualifier, matching the signature in the C standard: https://github.com/llvm/llvm-project/blob/cb112eb16cff222d8fbe7cfd3cb0834f538a691d/libc/src/string/memcpy.cpp#L15-L17 This means it will be compiled in a way such that when the two pointers are used to access the same memory, and at least one access is a write, there is UB. This is UB both in the C source semantics and in LLVM IR (via `noalias`). However, LLVM itself assumes that the libc `memcpy` is always defined behavior when `dst==src`. (This comes up frequently on this issue tracker, e.g. at https://github.com/llvm/llvm-project/issues/60734, https://github.com/llvm/llvm-project/issues/55399. But as far as I know it is not mentioned in the LLVM documentation.) Reading the llvm-libc sources, I did not find any guards that would short-circuit the function when `dst==src`. It seems very much like the usual copy loop will be executed, loading from `src` and storing to `dst`. This is UB due to the `restrict` keyword, making LLVM incompatible with its own libc even when said libc is compiled with LLVM -- unless I misunderstood something. (There are so many macros in the libc it's very hard to be sure what the actually compiled code will be.^^) musl also [uses `restrict`](https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c#n5) in its `memcpy`, so it is likewise incompatible with LLVM.

gchatelet commented 10 months ago

@RalfJung thx for bringing this issue to my attention.

However, LLVM itself assumes that the libc memcpy is always defined behavior when dst==src. (This comes up frequently on this issue tracker, e.g. at https://github.com/llvm/llvm-project/issues/60734, https://github.com/llvm/llvm-project/issues/55399. But as far as I know it is not mentioned in the LLVM documentation.)

I believe this is stated here for the IR but I haven't seen anything regarding requirements for the libc. Let me run a bunch of tests and get back to you with a proper answer.

RalfJung commented 10 months ago

Yes, the semantics of the builtin are documented. But that's guarantees LLVM makes to its users: if you call the builtin that way, all is good.

The fact that this becomes a libcall to some libc symbol means that LLVM also makes similar assumptions, where it is then someone else's responsibility to really ensure that memcpy (the library function, not the builtin) behaves that way. For clang there is an old WIP patch to document this at https://reviews.llvm.org/D86993, but I am not aware of anything for LLVM itself.

(For Rust, what I am really hoping for here is that we can get a guarantee that if the LLVM IR only calls the builtin in a way that the argument ranges are disjoint, then LLVM will also only ever emit libcalls to memcpy with disjoint arguments. That is, LLVM will never itself do any transformation that would rely on the fact that the memcpy builtin is well-defined for src==dst. That would be great news for Rust since then Rust could avoid making the src==dst assumption about memcpy -- an assumption that is problematic as demonstrated by the restrict attribute issue for musl and llvm-libc.)

RalfJung commented 10 months ago

Cc @nikic who's been involved in a bunch of these discussions.

nikic commented 10 months ago

Just so we're clear, when Ralf says "incompatible" he means "theoretically incompatible". There is no practical issue here, because this usage of restrict will not lead to an actual miscompile even if both pointers are equal (only if there is non-exact overlap, which is excluded).

There is no action that needs to be taken on the part of LLVM libc. If any action is taken, it will be on the side of LLVM.

SchrodingerZhu commented 10 months ago

If llvm.memcpy does have a looser requirement, isn't it the case that LLVM should emit the check when lowering the code? (Either dynamically or statically if the libc target is known)

RalfJung commented 10 months ago

It is theoretical in the sense that I am not aware of a compiler actually compiling memcpy in a way that this would be an issue. But I also have not checked the assembly.

Re-reading the definitions of restrict, I think this depends on whether "memory locations that are modified, by any means, during the execution of the function" includes memory locations that have their original value written back into them. Is doing a write always a modification, or is it only a modification if the write changes the value stored there?

EDIT: it was pointed out to me that the standard also contains this non-normative note:

"Modify" includes the case where the new value being stored is the same as the previous value.

So probably this is indeed intended to be UB.

gchatelet commented 10 months ago

I was reading more of the linked bugs and this GCC comment caught my attention. We don't currently use such strategies and I don't see it happening in the future but there's probably some hypothetical implementation of memcpy that will break when src==dst. That being said, by C standard definition "If copying takes place between objects that overlap, the behavior is undefined" : so I agree with @nikic https://github.com/llvm/llvm-project/issues/73516#issuecomment-1828268936 that if something is to be done this would be on LLVM side (e.g., guard lowering to libc memcpy call with a check that src!=dst).

RalfJung commented 10 months ago

I was reading more of the linked bugs and this GCC comment caught my attention.

Ah, that's an interesting one. And with restrict compilers would even be allowed to introduce such "prefetch with write intent" as part of the optimizations.

gchatelet commented 10 months ago

@RalfJung how do you want to move forward with this issue? It seems to me that this would be an issue for llvm itself. Maybe just documentation improvement?

RalfJung commented 10 months ago

Right, I think there's two issues here:

LLVM's assumptions should be documented somewhere in the official docs, someplace that has a URL one can reference.
LLVM's assumptions should be re-evaluated given that de jure, both llvm-libc and musl (when interpreted according to the C standard, and even when building with clang and interpreting the generated LLVM IR according to the LLVM LangRef) do not actually satisfy the assumptions. To justify the status quo, one has to do reasoning based on "but compilers don't actually do such optimizations even though they are allowed to", which is a fragile argument to make.

An alternative to (2) would have been to remove restrict from llvm-libc, but it seems the general consensus is that that is not the way to go.

michaelrj-google commented 10 months ago

If you want a copy that's safe even if the ranges overlap then you want memmove. From the man page:

DESCRIPTION
       The memmove() function copies n bytes from memory area src to memory area dest.  The memory areas may overlap:
       copying  takes  place as though the bytes in src are first copied into a temporary array that does not overlap
       src or dest, and the bytes are then copied from the temporary array to dest.

gchatelet commented 10 months ago

memmove has a cost though, it cannot be as fast as memcpy. IMHO, in order to have correct and efficient code we need to guard memcpy calls with a pointer inequality check.

jyknight commented 10 months ago

LLVM requires that memcpy of the same address work, because it uses memcpy for struct copies, e.g. struct X { char n[1000]; }; void foo(struct X*a, struct X*b) { *a = *b; }, and this must still work if a and b are equal. GCC does the same, BTW.

I'm certain we don't want to add a pointer inequality check to the struct copy codegen, nor switch it to call memmove.

RalfJung commented 10 months ago

I'm certain we don't want to add a pointer inequality check to the struct copy codegen, nor switch it to call memmove.

The only alternative is to have a pointer inequality check in memcpy, or remove the restrict from memcpy (and document in the libc that exact overlap is allowed). Are you certain that's better? To me this sounds like a case that needs actual data rather than speculation.

GCC does the same, BTW.

I am aware: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667, https://sourceware.org/bugzilla/show_bug.cgi?id=31055

In those discussions, the proposal came up of having a __memcpy_lax that explicitly allows exact overlap, to support this common need of a C compiler without compromising the performance of code that uses memcpy according to its documented contract.

Also, what you are describing is a concern of clang, not LLVM. LLVM is not concerned with C semantics, only with LLVM IR semantics. In Rust we'd ideally like to use LLVM without making this assumption about memcpy. We are taking care to call the LLVM memcpy builtin only with strictly non-overlapping pointers. It would be great to get a guarantee that LLVM will then also never perform any memcpy libcalls where the pointers overlap. In other words, even though the LLVM builtin says that equal pointers are allowed, it would be great to get a guarantee that LLVM transformations will never exploit this and introduce new memcpy with equal pointers. Therefore, all libcalls to memcpy with equal pointers correspond to the frontend generating a builtin memcpy with equal pointers, and if the frontend generates no such call, one can use LLVM with a memcpy implementation that does not allow equal pointers.

Basically, frontends should be able to choose whether they require a non-standard memcpy that allows exact overlap or not. clang is obviously free to make its own decisions, but in Rust I'd rather not rely on "yeah it's UB but compilers don't optimize restrict like that".

jyknight commented 10 months ago

The only alternative is to have a pointer inequality check in memcpy, or remove the restrict from memcpy (and document in the libc that exact overlap is allowed)

Those aren't the only available alternatives. The assembly code generated by today's compilers already works with exact overlap, with restrict, and without an extra branch.

I don't know if removing the restrict qualifier affects the performance of the generated code. If it doesn't, simply remove it to make the code semantically correct. But if it does: memcpy is performance-critical enough that it's worth it to find a different solution which preserves the current codegen -- without being theoretically invalid.

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

jyknight commented 10 months ago

We are taking care to call the LLVM memcpy builtin only with strictly non-overlapping pointers.

Does that automatically fall out of Rust's model, or do you explicitly take extra care? (That is: could it improve codegen for Rust to stop taking such care, given the guarantee that memcpy does work with equal pointers?)

RalfJung commented 9 months ago

The assembly code generated by today's compilers already works with exact overlap, with restrict, and without an extra branch.

I agree with this factual statement.

However, which compilers guarantee this to be the case? clang doesn't, as far as I can tell, given the LLVM IR semantics. So in the status quo we now have to audit the generated assembly after each compiler update, or something like that? Certainly not a great solution.

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

restrict is defined in terms of the memory being accessed through this pointer vs other pointers. It applies certain restrictions on each memory access. I don't see any way to incorporate a concept of "exact overlap" into that definition.

The only way out I can see for restrict is to to say that when a write writes the exact same value that is already stored at that location in memory, then it doesn't count as "mutation". In that case memcpy on exact overlap would not perform any mutation and therefore restrict doesn't impose any restrictions. However, defining "the exact same value" is tricky (e.g., do we mean the same abstract value -- at which type even? -- or the same underlying representation). I don't know if all compilers would be happy to commit so such a changed restrict. If compilers want to use the presence of "write to restrict-qualified pointer" to do things like "tell the CPU that certain memory can be discarded since it will be overwritten soon", then that would no longer be possible. (Those are ofc exactly the kind of optimizations that are also incompatible with the exact-overlap assumption.)

Does that automatically fall out of Rust's model, or do you explicitly take extra care? (That is: could it improve codegen for Rust to stop taking such care, given the guarantee that memcpy does work with equal pointers?)

This automatically falls out of Rust's model, specifically the disjointness information in Rust's &mut type. I'm not aware of a case where Rust would benefit from allowing exact overlap. The Rust intrinsic that corresponds to the LLVM memcpy builtin is called copy_nonoverlapping and it does require proper disjointness. Neither the Rust compiler nor the Rust standard library have a situation where the copy is on two regions that are "either disjoint or equal". We always either know it is disjoint (so we use copy_nonoverlapping aka memcpy), or we know nothing (so we use copy aka memmove).

For the specific situation where this happens in C, *to = *from, in Rust if these are raw pointers then we do allow partial overlap so we have to use memmove anyway. This is a rare situation; almost all code uses references and then to must be a mutable reference so we know it cannot alias from and we can use memcpy and we can prove full disjointness.

frederick-vs-ja commented 7 months ago

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

No, this definitely defeats the intent of restrict.

The only way out I can see for restrict is to to say that when a write writes the exact same value that is already stored at that location in memory, then it doesn't count as "mutation". In that case memcpy on exact overlap would not perform any mutation and therefore restrict doesn't impose any restrictions.

Hmm... it seems that we can say "An implementation may assume reading via expression E based on a restricted pointer p obtains the same value as (...), and the behavior is undefined if such assumption is violated." (which is not perfect, see below).

However, defining "the exact same value" is tricky (e.g., do we mean the same abstract value -- at which type even? -- or the same underlying representation).

Note that the formal defintion (located at, e.g., WG14 N3096 6.7.3.1) uses "expression" that should have a well-defined type, so it shouldn't be hard to specify this.

The major problem I see is that just modifying restrict seemly inevitably weakens alias analyzation. E.g. in this example

void fun(int * restrict p, int * restrict q)
{
    *p = *q;
    *q = *p;
}

IIUC an implementation can assume that p != q per current rules. But it seems that any method that makes memcpy(p, p, n) well-defined by modifying restrict would defeat such alias analyzation.

So I guess a possible option would be weakening the preconditions of memcpy which needs to drop restrict (while a weaker variant of restrict can still be used).

RalfJung commented 3 months ago

Right, I think there's two issues here:

I have split the first part (documentation) into a separate issue: https://github.com/llvm/llvm-project/issues/95889.

llvm / llvm-project

llvm libc (and musl) are incompatible with llvm memcpy assumptions #73516