Open CoriolisStorm opened 3 years ago
Thank you so much for the report! I've been encountering C/C++ compiler bugs with VC 2019, but I've been working around them. I don't test with VC2017.
I'll see if I can repo and add a workaround for VC 2017. Also my next big step on this project is adding OpenCL support for faster encoding (as time permits - we've been pretty slammed with Basis Universal work).
Thanks for open sourcing this great tool! We at Respawn plan to start using it soon for an internal tool making some of the textures in Apex Legends.
I was compressing using bc7e, decompressing using bc7decomp.cpp, and dumping the results as a PNG. It looked great, except some pixels that should have been orange were green instead. I disabled optimizations on bc7decomp.cpp so I could step through the code and see how it was encoded, and to my surprise, the colors it showed in the debugger were right! I let it finish, and the green pixels were in fact the proper orange.
I usually suspect uninitialized data when optimized code behaves differently, but when I looked into it in the debugger and read the generated disassembly, the compiler actually had a bug. This shocked me; compiler bugs are very rare! I've only encountered 2 other compiler bugs in 20 years of professional software development.
In unpack_bc7_mode4_5, there's a nested loop to initialize "endpoints[e][c]" on lines 393-397. With optimizations enabled, it looks like VC++ 2017 swapped the order of the inner and outer loops, then unrolled the loop over "c" (which was the outer loop but is now the inner loop). This changed where the bits read from "color_read_bits" ended up. Once the end points were decoded wrongly, everything else was wrong after that.
Here's the relevant disassembly, with comments added to help see what's going on:
Each of 2 iterations picks off N bits at a time (N = 5 or 7) and stores them in end_points[e][0..2]. In shorthand with everything unrolled, it writes the picked-off bits in order 00 01 02 10 11 12. However, the C++ code says the order should be 00 10 01 11 02 12. Reordering which bits went to which array entry caused the improper decompression.
My workaround was to manually unroll the inner loop, so that the optimizer wouldn't switch the inner/outer loops. Once I did that, the green pixels were orange in optimized builds as well. There may be other, better workarounds.
I was curious why I didn't notice this bug with the other decompressor modes. It turns out that the functions unpack_bc7_mode0_2 and unpack_bc7_mode1_3_7 have the extra line "uint64_t channel_read_chunk = channel_read_chunks[c];" at the top of the outer loop before the start of the inner loop, so the compiler can't swap the order of those loops. The last function is unpack_bc7_mode6, which doesn't loop at all, since there are only 2 endpoints to decode.
Sorry for not making this a pull request, but it seemed like GitHub wanted me to clone the whole repository and let it diff the changes to find out what I changed. I just wanted to suggest a new version for a single file, but I didn't see any way to do that in their interface.