rdp: Avoid signed overflow during texture block loading

ecsv commented 1 year ago

It isn't well defined how the compiler should behave when a signed integer overlows. This can lead to artifacts like scrambled textures when such an overflow happens.

For the input v16 == 1879048192 (0x70000000) and dxt == 268435456 (0x10000000) , it would calculate the result 536870912 (0x20000000) instead of -1 (2147483648 aka 0x80000000). Which means that the flipping will not be performed correctly. If only looking at the unmodified loop, the aggressive compiler optimizations will simply remove the parts of the code which assumes cannot be happen when no signed integer overflow would be present.

Fixes: mupen64plus/mupen64plus-video-glide64mk2#120

This can be reproduce with a recent compiler like gcc 12.2.0 and any optimization level of -O2 and higher on x86-64.

To see the reason for it, just compile and start it like this:

git clean -dfx; OPTFLAGS="-O3 -fsanitize=address -fsanitize=undefined" ccache make -C projects/unix/ all V=s -j$(nproc) && LD_PRELOAD=/lib/x86_64-linux-gnu/libasan.so.8 mupen64plus --emumode 0 --gfx projects/unix/mupen64plus-video-glide64mk2.so mario64.z64

You should then see errors like:

../../src/Glide64/rdp.cpp:1904:21: runtime error: signed integer overflow: 268435456 + 1879048192 cannot be represented in type 'int'

For the int overflow version, gcc 12 would create:

add    %r9d,%ecx
test   %ecx,%ecx
js     0x7fffc1066188 <rdp_loadblock()+600>
....
+600: mov    %eax,%ecx
sub $0x1,%eax
jne     0x7fffc1066188 <rdp_loadblock()+600>

After the add operation, ecx will have the correct value 0x80000000. But the next operation after the jump will directly overwrite out result with 0xf8 (our v15 "modified copy" of cnt). And then just slowly count down to zero - interesting effect of the undefined behavior "handling" in the GCC optimizer

With the fix, it would be compiled to:

add    %edx,%ecx
 js     0x7fffc1066188 <rdp_loadblock()+600>
...
+592:   add    %edx,%ecx
jns    0x7fffc10663c0 <rdp_loadblock()+1168>
+600:   mov    %r8d,%r9d
sub    %eax,%r9d
sub    $0x1,%eax
jne    0x7fffc1066180 <rdp_loadblock()+592>
cmp    $0x1,%r9d

Here it doesn't directly overwrite our %ecx (with v16) after the jump and instead operates on our v15 before it jumps back to operate on our v16 in %ecx (by adding dxt aka %edx).

If you want to read some (infamous) bug report about this kind of optimization: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475. So another way around these optimizations would be the compiler options -fno-strict-overflow -fno-aggressive-loop-optimizations. The first option is one which we use in the Linux kernel since 2009.

Narann commented 1 year ago

I like how simple this fix is. I see no reason to block the merge.

richard42 commented 1 year ago

Sven, it's great to hear from you again! Thanks for the arithmetic fix.

mupen64plus / mupen64plus-video-glide64mk2

rdp: Avoid signed overflow during texture block loading #123