Closed zjd1988 closed 5 years ago
Thanks for reporting! it is a misprint unnoticed for sure. Just replace vmull_s16 by vmull_u16 it should work. millions of thanks for reporting!!!
@Zvictoria I want to thank you guys for creating ARM_NEON_2_x86_SSE . It's convenient to run and debug arm code in windows.
this function is used for U16, but the function implementation use S16,please check below。 ...... _NEON2SSESTORAGE uint32x4_t vmull_n_u16(uint16x4_t vec1, uint16_t val2); // VMULL.s16 q0,d0,d0[0] _NEON2SSE_INLINE uint32x4_t vmull_n_u16(uint16x4_t vec1, uint16_t val2) // VMULL.s16 q0,d0,d0[0] { uint16x4_t b16x4; b16x4 = vdup_n_s16(val2); return vmull_s16(vec1, b16x4); }