wolfpld / etcpak

The fastest ETC compressor on the planet
Other
229 stars 41 forks source link

`ProcessRGB.cpp` only supports AArch64 NEON intrinsics, fails on ARMv7 on Android #25

Closed akien-mga closed 2 years ago

akien-mga commented 2 years ago

We use etcpak in the Godot editor, and the Godot editor can be compiled for ARMv7 with NEON for Linux, Windows and Android.

For the Android build at least, using NDK 21.4, we're running into a build issue in etcpak due to the use of AArch64 NEON intrinsics in ProcessRGB.cpp, which are not guarded with #if defined(__aarch64__):

/root/sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ -o thirdparty/etcpak/ProcessRGB.android.tools.armv7.o -c -std=gnu++17 -frtti -w -O0 -g -fno-limit-debug-info -fpic -ffunction-sections -funwind-tables -fstack-protector-strong -fvisibility=hidden -fno-strict-aliasing -march=armv7-a -mfloat-abi=softfp -mfpu=neon -target armv7-none-linux-androideabi -gcc-toolchain /root/sdk/ndk/21.4.7075529/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64 -Wall -Wno-ordered-compare-function-pointers -w -UNDEBUG -isystem /root/sdk/ndk/21.4.7075529/sources/cxx-stl/llvm-libc++/include -isystem /root/sdk/ndk/21.4.7075529/sources/cxx-stl/llvm-libc++abi/include --sysroot=/root/sdk/ndk/21.4.7075529/sysroot -isystem /root/sdk/ndk/21.4.7075529/sysroot/usr/include/arm-linux-androideabi -isystem /root/sdk/ndk/21.4.7075529/sources/android/support/include -DDEBUG_ENABLED -DDEV_ENABLED -DNO_EDITOR_SPLASH -D_DEBUG -D__ANDROID_API__=24 -DNO_STATVFS -DGLES_ENABLED -D_FILE_OFFSET_BITS=64 -D__ARM_ARCH_7__ -D__ARM_ARCH_7A__ -D__ARM_NEON__ -DANDROID_ENABLED -DUNIX_ENABLED -DNO_FCNTL -DVULKAN_ENABLED -DTOOLS_ENABLED -DMINIZIP_ENABLED -DZSTD_STATIC_LINKING_ONLY -DUSE_VOLK -DVK_USE_PLATFORM_ANDROID_KHR -Ithirdparty/etcpak -Ithirdparty/libpng -Ithirdparty/volk -Ithirdparty/vulkan -Ithirdparty/vulkan/include -Ithirdparty/zstd -Ithirdparty/zlib -Iplatform/android -I. thirdparty/etcpak/ProcessRGB.cpp
thirdparty/etcpak/ProcessRGB.cpp:2595:25: error: use of undeclared identifier 'vmaxvq_u8'
    const uint8_t max = vmaxvq_u8( buffer );
                        ^
thirdparty/etcpak/ProcessRGB.cpp:2615:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2616:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2617:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2619:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2620:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2621:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2642:25: error: use of undeclared identifier 'vminvq_u8'; did you mean 'vmvnq_u8'?
    const uint8_t min = vminvq_u8( buffer );
                        ^~~~~~~~~
                        vmvnq_u8
/root/sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.9/include/arm_neon.h:16368:17: note: 'vmvnq_u8' declared here
__ai uint8x16_t vmvnq_u8(uint8x16_t __p0) {
                ^
thirdparty/etcpak/ProcessRGB.cpp:2642:19: error: cannot initialize a variable of type 'const uint8_t' (aka 'const unsigned char') with an rvalue of type 'uint8x16_t' (vector of 16 'uint8_t' values)
    const uint8_t min = vminvq_u8( buffer );
                  ^     ~~~~~~~~~~~~~~~~~~~
thirdparty/etcpak/ProcessRGB.cpp:2662:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2663:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2664:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2666:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2667:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2668:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2856:43: error: use of undeclared identifier 'vzip1q_u64'
    uint8x16_t rr = vreinterpretq_u8_u64( vzip1q_u64( vreinterpretq_u64_u8( rgb01.val[0] ), vreinterpretq_u64_u8( rgb23.val[0] ) ) );
                                          ^
thirdparty/etcpak/ProcessRGB.cpp:2857:43: error: use of undeclared identifier 'vzip2q_u64'
    uint8x16_t gg = vreinterpretq_u8_u64( vzip2q_u64( vreinterpretq_u64_u8( rgb01.val[0] ), vreinterpretq_u64_u8( rgb23.val[0] ) ) );
                                          ^
thirdparty/etcpak/ProcessRGB.cpp:2858:43: error: use of undeclared identifier 'vzip1q_u64'
    uint8x16_t bb = vreinterpretq_u8_u64( vzip1q_u64( vreinterpretq_u64_u8( rgb01.val[1] ), vreinterpretq_u64_u8( rgb23.val[1] ) ) );
                                          ^
18 errors generated.

You have such guards in ProcessDxtc.cpp with versions for both AArch64 and ARMv7, but not in ProcessRGB.cpp: https://github.com/wolfpld/etcpak/blob/14dfc2b8d935a71c9ae5772119520d00fe8e2eac/ProcessDxtc.cpp#L277-L281 https://github.com/wolfpld/etcpak/blob/14dfc2b8d935a71c9ae5772119520d00fe8e2eac/ProcessRGB.cpp#L2592-L2596

Would it be possible to add checks to ProcessRGB.cpp so that it only uses AArch64 specific intrinsics when __aarch64__ is defined?

AFAICT I'm fine if it falls back to a slow path on ARMv7+NEON, I don't expect it to be a very relevant use case for using etcpak in production, but it would be good if it could compile nevertheless (otherwise that means we can't ship the Godot editor on ARMv7 platforms, unless we disable VRAM compression which is pretty important for our use case).

BTW, the fallback path doesn't seem to be functional, as this hack:

diff --git a/thirdparty/etcpak/ProcessRGB.cpp b/thirdparty/etcpak/ProcessRGB.cpp
index d60164bcc8..b1c973c16f 100644
--- a/thirdparty/etcpak/ProcessRGB.cpp
+++ b/thirdparty/etcpak/ProcessRGB.cpp
@@ -1,3 +1,8 @@
+#if defined(__ARM_NEON) && !defined(__aarch64__)
+// All code below assumes __ARM_NEON == __aarch64__.
+#  undef __ARM_NEON
+#endif
+
 #include <array>
 #include <string.h>
 #include <limits>

Leads to these errors:

[ 68%] Compiling thirdparty/etcpak/ProcessRGB.cpp ...
thirdparty/etcpak/ProcessRGB.cpp:2287:14: error: cannot initialize a variable of type 'uint8_t *' (aka 'unsigned char *') with an rvalue of type 'uint8_t (*)[16]'
    uint8_t* luma = &l.val;
             ^      ~~~~~~
thirdparty/etcpak/ProcessRGB.cpp:2824:28: error: unknown type name 'Channels'
static etcpak_force_inline Channels GetChannels( const uint8_t* src )
                           ^
thirdparty/etcpak/ProcessRGB.cpp:2826:5: error: unknown type name 'Channels'
    Channels ch;
    ^
thirdparty/etcpak/ProcessRGB.cpp:3073:5: error: unknown type name 'Channels'
    Channels ch = GetChannels( src );
    ^
wolfpld commented 2 years ago

The issue you see should be limited to changes made in commits f11aaa37 and/or ae0e7eb44d. Code paths should be properly guarded in the remaining cases (which are the majority both code- and performance-wise). You may check if this is indeed the case and just fix these few #ifdefs, falling back to scalar code.