I have a small fix for the rotation instructions that the project has a bit of inline asm for. An additional mask causes both gcc as far back as 4.9.0 and clang as far back as 11.0 to successfully infer a rotation and emit a rotation instruction.
I left the ASM in and merely updated the comment, though it might be worth considering deleting the ASM, as that's harder for the compiler to optimize around and with this mask the rotation instructions infer well even on somewhat older compiler versions.
The reason for the mask's necessity I believe is essentially a result of the usual arithmetic conversions: rotating a uint8 by 16 loops around twice but the rotl function before the changelist doesn't account for them. (uint8)i << (uint8)k first casts i and k to ints which means that undefined behavior does not apply for the uint8 and uint16 cases. So, the uint8 overload produced jibberish for rotations between 8 and 31 inclusively. But defined jibberish. However, uint32 and uint64 were able to emit rot instructions without the mask because >100% wraparound rotations invoked UB in the code. Fun and wacky nasal demons.
Hello,
I have a small fix for the rotation instructions that the project has a bit of inline asm for. An additional mask causes both gcc as far back as 4.9.0 and clang as far back as 11.0 to successfully infer a rotation and emit a rotation instruction.
Here is a demonstration of the changelist: https://godbolt.org/z/PbMz4xcbj
I left the ASM in and merely updated the comment, though it might be worth considering deleting the ASM, as that's harder for the compiler to optimize around and with this mask the rotation instructions infer well even on somewhat older compiler versions.
The reason for the mask's necessity I believe is essentially a result of the usual arithmetic conversions: rotating a uint8 by 16 loops around twice but the rotl function before the changelist doesn't account for them. (uint8)i << (uint8)k first casts i and k to ints which means that undefined behavior does not apply for the uint8 and uint16 cases. So, the uint8 overload produced jibberish for rotations between 8 and 31 inclusively. But defined jibberish. However, uint32 and uint64 were able to emit rot instructions without the mask because >100% wraparound rotations invoked UB in the code. Fun and wacky nasal demons.
For your convenience, I also have a copy of this PR for the C version of this library: https://github.com/imneme/pcg-c/pull/34
Thanks, Richard