Open davidben opened 9 months ago
It's a little interesting that compiler output is checking the sign of size_
, and not whether data_ + size_
overflowed. I'm guessing the issue is not overflow, per se, but that data_ + size_
gets lowered to data_ + (ptrdiff_t)size_
, and we're checking if (ptrdiff_t)size_
is negative?
If I'm reading the spec right, I think pointer arithmetic is in terms of the numerical value of the offset, without coercing into any particular common type, which means we can safely assume that (ptrdiff_t)size_
is positive here. But it is a little more subtle.
Therefore, the addition cannot have overflowed and data <= data + size_ is always true in all defined cases
IIUC, this boils down to: (https://godbolt.org/z/TvoKvoGsd)
bool f(int *p, size_t s) {
return p <= p + s;
}
I'm guessing the issue is not overflow, per se, but that data + size gets lowered to data_ + (ptrdifft)size, and we're checking if (ptrdifft)size is negative?
Yes, I think it's something like this.
On the LLVM level, the pointer arithmetic gets lowered to a getelementptr inbounds
instruction. That treats the offset as a signed value, although in this case it isn't (I think?).
If s
were a uint32_t
instead, LLVM would zero-extend that to 64 bits, so the optimizer sees that it can't be negative. But in the original code, it seems that information is lost.
Probably an optimization expert could explain better what's going on. Maybe @aeubanks can take a look?
Yeah that's pretty much it. LLVM doesn't treat integer values as unsigned/signed, but rather operations that work with ints as allowing/prohibiting unsigned/signed overflow. But since getelementptr
currently can't represent any of that, we lose information lowering to IR.
bool f(int *p, unsigned long s) {
// __builtin_assume((long)s >= 0);
return p <= p + s;
}
Without the __builtin_assume()
we're missing the optimization. With the __builtin_assume()
we get the optimization we expect, but the IR contains calls to llvm.assume()
which aren't dropped until very late and can block optimizations.
If we think this is important enough, then perhaps we should extend LLVM IR to be able to represent this in getelementptr
without having to resort to a big hammer like llvm.assume()
. Perhaps similar to https://discourse.llvm.org/t/rfc-add-zext-nneg-flag/73914, add a nneg
flag to getelementptr
, to indicate that a getelementptr
only works with non-negative indices?
@fhahn @nikic
Having nusw
and nneg
flags on getelementptr are on my long-term wishlist :)
If we think this is important enough
I definitely don't know enough about this to say anything concrete here. It seems to be plausibly relevant for the other missed optimizations I found after this one. (I've been spending a lot of time exploring Chromium's and libc++'s bounded iterators. If we could get to the point that iterators are broadly bounds-checked, but that all redundant checks are reliably deleted, that would be a great safety win.)
For context, this issue was also discussed here: https://discourse.llvm.org/t/retain-fact-that-index-offsets-for-gep-are-non-negative-for-size-t-uint64-t-offsets/65974
Might be time to revive this now
I noticed this as I was exploring the compiler output for some of our (Chromium's) checked iterators. To be honest, I'm not sure this check makes sense, since we should just assume the container is self-consistent. But, nonetheless, I think Clang could have deleted this check, so I'm filing this as a missed optimization.
Reproducer: https://godbolt.org/z/EzoWqGEEK
This code ends up performing a few variations on checking that
data_ <= data_ + size_
, wheresize_
is asize_t
. Of course, the real reason this is always valid is that[data_, data_ + size_)
describes a span, but the compiler doesn't a priori know that. However, I believe it can infer this because:size_
is unsigned, so we're not adding a negative number todata_
ptrdiff_t
first: https://eel.is/c++draft/expr.add#4.2data_ <= data_ + size_
is always true in all defined cases