llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.55k stars 11.8k forks source link

Can __builtin_memcpy be given a null pointer argument? #49459

Open davidstone opened 3 years ago

davidstone commented 3 years ago
Bugzilla Link 50115
Version trunk
OS Linux
CC @davidbolvansky,@zygoloid

Extended Description

The C function memcpy cannot be given null pointers as arguments, even if the size is 0. This is checked by UBSan.

It is unclear whether __builtin_memcpy has this same restriction. UBSan accepts calls to __builtin_memcpy with null pointer arguments as long as the size is 0. The documentation does not say whether this is guaranteed behavior.

I hope that it is defined to call it with null pointer arguments, since the entire reason I'm using it is to avoid that check.

davidbolvansky commented 3 years ago

__builtin_memcpy(null, null, C) where C is > 0 is UB.

If we know that C is value > 0, LLVM infers that dst and src ptrs must be nonnull.

__builtin_memcpy(dst, src, dynamicsize) - LLVM in general case does nothing fancy here, in same cases LLVM can prove that dynamicsize is > 0, so it is a same story as C > 0 case.

davidstone commented 3 years ago

__builtin_memcpy does give undefined behavior with gcc when passed null pointer arguments.

davidben commented 1 year ago

@danakj noticed that libc++ itself seems to rely on it being allowed: https://github.com/llvm/llvm-project/blob/cfc5c6cb8d90bbeaa38e25bab7df77bc39257fd6/libcxx/include/__algorithm/copy_move_common.h#L62-L67

I agree that it should be written down somewhere.

Also, IMO, the corresponding UB in memcpy is just a bug in the C spec. If the overall LLVM community agrees, ideally that position could be documented somewhere, and presented as feedback to the standards committee. As someone maintaining a C library, I can certainly report that it's extremely problematic for us.

shafik commented 1 year ago

CC @erichkeane @AaronBallman

erichkeane commented 1 year ago

The CFE just turns a __builtin_memcpy into llvm.memcpy: https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic

THAT is documented, but documentation extending that to the Clang docs is perhaps a good idea. I WILL note that it looks like this example: (https://godbolt.org/z/jPjMche6P) has the X86-64 backend implementing this in terms of a library call.

davidben commented 1 year ago

Wow, so llvm.memcpy is itself relying on memcpy being defined for memcpy(NULL, NULL, 0)? Amazing. All the more reason to get this bug out of the C spec then. :-)

AaronBallman commented 1 year ago

I think we have no real choice in this -- the C standard is clear about this being undefined behavior (see 7.1.4p1), and common C standard library implementations explicitly mark the pointers as being nonnull: https://codebrowser.dev/glibc/glibc/string/string.h.html#43, https://git.uclibc.org/uClibc/tree/include/string.h#n37, https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/string.h#53, and likely others. Even if WG14 were to change directions, and I would be somewhat surprised if they did, we still have to contend with whatever CRT is installed on the OS, which means these libc implementations will still exhibit UB when given a null pointer and we defer to the CRT. So even if e.g., glibc were to remove that marking, we'd still run into the problems with all the older versions of glibc that are still floating around which do have the marking.

nikic commented 1 year ago

@AaronBallman I think you're confusing memcpy the libc function with the memcpy the libcall here. LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

Incidentally, the memcpy libc function and libcall are one and the same once you discard the frontend-enforced restrictions -- all actual implementations of memcpy do satisfy these requirements.

As such, we can change the definition of __builtin_memcpy (which can be legalized to the libcall, but is unrelated to the libc function) to follow these semantics -- provided that GCC does not specify something contrary for the builtin, of course.

AaronBallman commented 1 year ago

@AaronBallman I think you're confusing memcpy the libc function with the memcpy the libcall here.

Entirely possible!

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

I'm confused. I read this as saying "when LLVM lowers the llvm.memcpy intrinsic to a library call resolved by the linker (or loader), that library call must have the same semantics as the llvm.memcpy intrinsic". Am I reading this wrong?

Incidentally, the memcpy libc function and libcall are one and the same once you discard the frontend-enforced restrictions -- all actual implementations of memcpy do satisfy these requirements.

My concern is that multiple libc implementations are built expecting nonnull pointers (due to them explicitly specifying the nonnull attribute), so if we end up calling one of those functions and pass it a null pointer, we are breaking their contract. While LLVM might not optimize based on passing a null pointer to such a call, the actual library implementation might still misbehave because it's being called out of contract and we shouldn't promise that won't have undefined behavior.

nikic commented 1 year ago

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

I'm confused. I read this as saying "when LLVM lowers the llvm.memcpy intrinsic to a library call resolved by the linker (or loader), that library call must have the same semantics as the llvm.memcpy intrinsic". Am I reading this wrong?

You're reading this right :)

Incidentally, the memcpy libc function and libcall are one and the same once you discard the frontend-enforced restrictions -- all actual implementations of memcpy do satisfy these requirements.

My concern is that multiple libc implementations are built expecting nonnull pointers (due to them explicitly specifying the nonnull attribute), so if we end up calling one of those functions and pass it a null pointer, we are breaking their contract. While LLVM might not optimize based on passing a null pointer to such a call, the actual library implementation might still misbehave because it's being called out of contract and we shouldn't promise that won't have undefined behavior.

A libc implementation (in the sense of the linked-to artifacts) that does not behave in a well-defined way with null/equal arguments is not compatible with LLVM. This is the case right now, because it's a requirement of llvm.memcpy.

You are right that all of this is on rather tenuous ground, but that's how things are :)

davidben commented 1 year ago

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

Is this documented somewhere? This sounds like an excellent data point that the specification is wrong. Sure, even with that fixed (quite a tall order in itself), it'll take a while to undo the problematic annotation in the headers. (Although Clang could choose to just ignore the bad annotation, given it knows perfectly well what memcpy is, and apparently imposes this requirement on it already!) But we have to start somewhere.

As a maintainer a security-critical C library used in quite a lot of places, this language bug causes us endless headache, so I am very interested in getting it fixed.

AaronBallman commented 1 year ago

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

I'm confused. I read this as saying "when LLVM lowers the llvm.memcpy intrinsic to a library call resolved by the linker (or loader), that library call must have the same semantics as the llvm.memcpy intrinsic". Am I reading this wrong?

You're reading this right :)

Thank you!

Incidentally, the memcpy libc function and libcall are one and the same once you discard the frontend-enforced restrictions -- all actual implementations of memcpy do satisfy these requirements.

My concern is that multiple libc implementations are built expecting nonnull pointers (due to them explicitly specifying the nonnull attribute), so if we end up calling one of those functions and pass it a null pointer, we are breaking their contract. While LLVM might not optimize based on passing a null pointer to such a call, the actual library implementation might still misbehave because it's being called out of contract and we shouldn't promise that won't have undefined behavior.

A libc implementation (in the sense of the linked-to artifacts) that does not behave in a well-defined way with null/equal arguments is not compatible with LLVM. This is the case right now, because it's a requirement of llvm.memcpy.

You are right that all of this is on rather tenuous ground, but that's how things are :)

This isn't tenuous ground -- it's nonconforming behavior and it's a bug in LLVM. glibc is one of the most-used CRTs in the wild and LLVM is not compatible with it, that's not a good position for us to be in. We can certainly try to see if WG14 has any appetite to change the behavior of these C standard library functions, but if they don't, I think LLVM needs to change to match reality. We're playing with fire currently.

AaronBallman commented 1 year ago

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

Is this documented somewhere?

Yes: https://llvm.org/docs/LangRef.html#id505

This sounds like an excellent data point that the specification is wrong.

Speaking as a WG14 member, it really isn't that compelling. If no implementation treated this as UB, then we might be able to build a case for standardizing existing practice. However, there are multiple existing CRTs that follow the specification and make this explicit UB, so pointing to one implementation that doesn't conform to the standard is most likely going to be viewed as a bug to go fix.

What could perhaps be compelling would be data that demonstrates this UB is always unnecessary for performance or correctness. That means figuring out why the specification is the way it is written currently (my guess: there's probably an instruction set that implements memcpy somewhat directly that will misbehave if given a size of 0 and a null pointer).

Sure, even with that fixed (quite a tall order in itself), it'll take a while to undo the problematic annotation in the headers.

Decades.

(Although Clang could choose to just ignore the bad annotation, given it knows perfectly well what memcpy is, and apparently imposes this requirement on it already!) But we have to start somewhere.

That's not the problem though, unfortunately. We can define the behavior when we produce the machine code to perform the copy, but we don't have a choice in this matter once we lower to an actual library call. The CRT being linked against has a contract and we're not honoring it, which is the problem. The library call implementation may misbehave when given a zero size and a null pointer, and we can't document that it's well-defined behavior (that would be lying to users).

As a maintainer a security-critical C library used in quite a lot of places, this language bug causes us endless headache, so I am very interested in getting it fixed.

FWIW, I also wish the standard wasn't specified as it is in this case. It's a security nightmare IMO. We can define the behavior for our implementation, but that means we have to actually honor that definition by not lowering the intrinsic to a library call.

AaronBallman commented 1 year ago

This isn't tenuous ground -- it's nonconforming behavior and it's a bug in LLVM.

Hmm, not quite nonconforming (it's UB and we're trying to make it well-defined, but we're not changing the behavior of a well-defined program); just a buggy extension.

Actually, I suppose another way to look at it would be that it's a Clang issue to use the llvm.memcpy (et al) intrinsic and we could instead lower to the library call. I think that'd be really unfortunate (and I have no idea if LLVM would notice the call to memcpy and decide to convert it to an intrinsic anyway), we'd probably lose optimization opportunities, but at least we'd not be introducing a security concern in the cases where the intrinsic would lower to a library call.

To be clear, the security concern I have is: we promise users it's well-defined to pass 0/null (or equal pointers) and then lower to a libc library call that can misbehave at runtime. It's that promise of it being well-defined in all circumstances that's the issue. Saying "your libc isn't compatible with LLVM" has it the wrong way around; it's LLVM that's making promises it can't keep.

nikic commented 1 year ago

Note that glibc only makes these parameters nonnull for users of glibc headers -- the glibc implementation itself is compiled without that assumption. The relevant macro expands to nothing in that case.

This is exactly the distinction we need, in that libc users get the C nonnull contract, while libcall users don't.

Of course, if the C standard changed the specification to remove the nonnull requirement, that would be great for plenty of other reasons (it makes code simpler, faster and more secure), but it's not really a requirement for LLVM's purposes. We can have different C and CRT contracts (the former can be a subset of the latter).

Edit: And regarding the other assumption (passing equal pointers is safe), that assumption is also made by GCC, for the same reasons as it is made in Clang: https://cpp.godbolt.org/z/3e46GYGxe

davidben commented 1 year ago

LLVM has a hard requirement that the memcpy libcall accepts null arguments if the length is zero and also accepts equal arguments (but not other kinds of overlap). LLVM is not compatible with runtimes that do not satisfy these requirements.

Is this documented somewhere?

Yes: https://llvm.org/docs/LangRef.html#id505

I mean is it documented anywhere that the memcpy libcall is required to meet those requirements? Sure, while formally speaking LLVM is making an unfounded assumption, that LLVM would much rather make this assumption is strong evidence that the rule in the C spec is badly defined. Badly-defined things in specs are spec bugs and should be fixed. I'm happy to take this discussion thread as the citation, but it sounded like LLVM believes in this more strongly. If so, I'd love a more definitive link.

However, there are multiple existing CRTs that follow the specification and make this explicit UB, so pointing to one implementation that doesn't conform to the standard is most likely going to be viewed as a bug to go fix.

We have examples of CRTs whose headers have a non-null annotation. Do we have examples of CRTs whose actual implementation truly make that assumption? Given we also have an example of an extremely widely deployed compiler that requires the corrected semantics, I find it very unlikely those CRTs actually have this precondition. It looks like a mistake in the same way that the specification text was likely a mistake (see below).

What could perhaps be compelling would be data that demonstrates this UB is always unnecessary for performance or correctness.

See the bottom of https://www.imperialviolet.org/2016/06/26/nonnull.html on the performance side.

Not sure what it means for UB to be unnecessary for correctness. I think it's the converse: defining the behavior is necessary for correctness because no one writes C this way. I have a longer write-up I've been meaning to put together. memcpy's current behavior, along with a few other bugs around (NULL, 0) spans in C make it so that code written in the natural way will break with empty spans when represented in the natural way. Keep in mind that, throughout the STL, C++ uses (NULL, 0) for empty spans. It's also simply the most natural representation. This is not just "UB is bad". This is a broken abstraction in the language specification.

That means figuring out why the specification is the way it is written currently (my guess: there's probably an instruction set that implements memcpy somewhat directly that will misbehave if given a size of 0 and a null pointer).

The text of the specification is extremely generic and not specific to memcpy. It says:

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.

This reads very strongly like an oversight. The first statement is the problematic one, and it was clearly written with pointers to single objects in mind. It just failed to account for zero-length spans allowing null.

AaronBallman commented 1 year ago

Note that glibc only makes these parameters nonnull for users of glibc headers -- the glibc implementation itself is compiled without that assumption. The relevant macro expands to nothing in that case.

That's good to know, thank you.

This is exactly the distinction we need, in that libc users get the C nonnull contract, while libcall users don't.

Of course, if the C standard changed the specification to remove the nonnull requirement, that would be great for plenty of other reasons (it makes code simpler, faster and more secure), but it's not really a requirement for LLVM's purposes. We can have different C and CRT contracts (the former can be a subset of the latter).

Perhaps not for LLVM, but for Clang, I'm deeply uncomfortable with the promises LLVM wishes to make here. We cannot document a promise we cannot keep. The C standard defines a contract between the compiler, the standard library, and the user. In this case, the compiler is making a promise without considering the standard library is free to violate that promise. I'm tempted to solve this by documenting __builtin_memcpy in Clang as following the C standard (these operations are UB) because LLVM can still dispatch the builtin to a runtime call to an arbitrary library so that users are not given a false sense of security. Users calling memcpy() already believe they're calling a library function, but users calling __builtin_memcpy() might assume we do not lower that to a library call, so reminding them to the fact that it might still be UB under some circumstances is reasonably conservative.

AaronBallman commented 1 year ago

This reads very strongly like an oversight. The first statement is the problematic one, and it was clearly written with pointers to single objects in mind. It just failed to account for zero-length spans allowing null.

The only way to know if it is or is not an oversight is to write a paper asking WG14. I've not found any evidence that this specific question was posed to the committee, so I think it's a worthwhile endeavor. I can certainly support the LLVM position in that discussion.

davidstone commented 1 year ago

I've also been thinking of submitting this as an issue to WG14. If I can ignore the null check, I have some code patterns that compile down to nothing, but if I have to check that confused the optimizer enough that I end up with hundreds of lines of assembly. These are fairly common code patterns involving things like push_back, which is how I came across this to begin with.

I have looked at several open source implementations of memcpy to see "what would go wrong". If they were compiled as C++, they would all just work (which makes sense, the only way it would actually break is if they asserted non-null). However, in C you cannot do null_pointer + 0, which occurs in a few implementations prior to them doing any checks. My thought for what to address with WG14 is to remove that language restriction (allow null_pointer + 0) and then the library restriction becomes entirely arbitrary.

I have never attended a WG14 meeting and am not super familiar with their process compared to WG21, though, so I keep putting off writing this paper.

AaronBallman commented 1 year ago

@nikic started a paper in late Aug and it's been on my list of things to review for... too long. Nikita, are you okay sharing the link to your google doc here? Then others can also help improve the paper.

(WG14 isn't going to discuss changes to C until 2024, so we're not bumping up against any deadlines right now.)

nikic commented 1 year ago

Sure, here's the google doc: https://docs.google.com/document/d/1guH_HgibKrX7t9JfKGfWX2UCPyZOTLsnRfR6UleD1F8

RalfJung commented 10 months ago

I mean is it documented anywhere that the memcpy libcall is required to meet those requirements?

I'd also be interested in seeing the assumptions that LLVM makes for its libcalls documented properly somewhere. The Rust compiler uses LLVM as a backend so we inherit whatever assumptions LLVM makes, and currently we have to basically guess what those are. (Well, not quite guess, we are getting good feedback by talking to the right people, but that's not the same as having this be properly documented in a way that we can reference.)

And I'm a big fan of allowing memcpy(src, dst, 0) on all pointers (null, dangling, whatever). :-)

GabrielRavier commented 8 months ago

And I'm a big fan of allowing memcpy(src, dst, 0) on all pointers (null, dangling, whatever). :-)

Allowing memcpy(src, dst, 0) where src or dst are neither a valid pointer nor null seems entirely useless given that iirc any use of an invalid/dangling pointer (i.e. evaluating the value of such a pointer in any expression whatsoever) results in undefined behavior, so memcpy(src, dst, 0) where src or dst is an invalid/dangling pointer isn't UB just because of the call to memcpy, since a statement such src, dst; would also have resulted in undefined behavior.

RalfJung commented 8 months ago

any use of an invalid/dangling pointer (i.e. evaluating the value of such a pointer in any expression whatsoever) results in undefined behavior

That's true in C but not in other languages that use LLVM for its backend, e.g. Rust. It's not true in LLVM IR either, as far as I can tell. (E.g., LangRef seems to very explicitly permit creating such invalid pointers via getelementptr without inbounds.)