Open pkasting opened 5 days ago
@llvm/issue-subscribers-backend-x86
Author: None (pkasting)
Same thing for Aarch64, so I believe this isn't architecture-specific.
Basically what it's testing is a.x < b.x + (a.y < b.y)
, except that if you write that in C yourself, you'll be hit by integer wraparound if b.x == UINT64_MAX
. (It's really surprising to me that this works; you would think that you can't just randomly add 1 sometimes to the right side of the equation, but the only case where it actually matters is indeed when a.x == b.x
and that's precisely when you want it.)
Clarity: The godbolt link above assumes the most significant 64 bit word is stored in the first member of the pair/struct, regardless of CPU endianness, so that lexicographic comparison order on the pair still gives the equivalent result to arithmetic comparison on a corresponding __uint128_t
. That is, this code is not suitable for e.g. "type pun this __uint128_t in memory to the struct and then run the relevant code on it". You could write code that was suitable for that, of course.
See https://godbolt.org/z/1ajsxs8n5 for examples.
On x86-64 -O2, any implementation of comparing pairs of
uint64_t
s seems to generate noticeably worse code (along all axes) compared to using single__uint128_t
s. Seems like the former could be optimized into a form close to the latter in at least many cases.