wolfpld / etcpak

The fastest ETC compressor on the planet
Other
229 stars 41 forks source link

Fix invalid range for ARMv7 NEON lane intrinsics #30

Closed akien-mga closed 2 years ago

akien-mga commented 2 years ago

Compiling Etcpak for Android ARMv7 with NDK r23 would throw errors like this:

thirdparty/etcpak/ProcessRGB.cpp:3259:16: error: argument value 4 is outside the valid range
        return vdupq_lane_s16( vget_low_s16( multipliers ), ClampConstant( Lane, 0, 4 ) );
               ^                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/sdk/ndk/23.2.8568313/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/12.0.9/include/arm_neon.h:5979:14: note: expanded from macro 'vdupq_lane_s16'
  __ret_24 = splatq_lane_s16(__s0_24, __p1_24); \
             ^                        ~~~~~~~
/root/sdk/ndk/23.2.8568313/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/12.0.9/include/arm_neon.h:829:23: note: expanded from macro 'splatq_lane_s16'
  __ret = (int16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 1); \
                      ^                                            ~~~~
thirdparty/etcpak/ProcessRGB.cpp:3723:46: note: in instantiation of function template specialization 'WidenMultiplier_EAC_NEON<8>' requested here
    uint8x8_t recVals[16] = { EAC_APPLY_16X( EAC_RECONSTRUCT_VALUE ) };
                                             ^

These didn't happen with NDK r21, probably because this validation was added more recently to Clang (possibly https://reviews.llvm.org/D74619).

I'm not familiar with NEON intrinsics but this looks like an off-by-one error introduced in d8be08eda71bf07dfa17d6fb3eefe71a8374101a, CC @hkr.

Related issue in Godot: https://github.com/godotengine/godot/issues/62516

wolfpld commented 2 years ago

The documentation specifies the following constraints:

uint8x8_t   vdup_lane_u8(uint8x8_t vec, __constrange(0,7) int lane);
int16x8_t   vdupq_lane_s16(int16x4_t vec, __constrange(0,3) int lane);