According to the ARM User Guide , the args for vcgtq_s8 should be int8x16_t . The original code throws type errors with gcc 10.2 on armv8 / raspberry pi 4. This PR compiles without warnings or errors, is slightly faster than the non-TRANSPOSE version, and passes sanity checks on benchmarks.
According to the ARM User Guide , the args for
vcgtq_s8
should beint8x16_t
. The original code throws type errors with gcc 10.2 on armv8 / raspberry pi 4. This PR compiles without warnings or errors, is slightly faster than the non-TRANSPOSE version, and passes sanity checks on benchmarks.