intel / ARM_NEON_2_x86_SSE

The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to AVX2 intrinsic functions
Other
430 stars 149 forks source link

Problem in vcvtq_n_s32_f32 and others #49

Closed sensasonic closed 3 years ago

sensasonic commented 3 years ago

Hi,

When doing the following test:

    float32x4_t f = vdupq_n_f32(0.5f);
    int32x4_t ff = vcvtq_n_s32_f32 (f,31);
    int32_t ii = vgetq_lane_s32(ff,0);
    printf("%f -> %08X\n",a,ii);

it produces the following result: 0.500000 -> C0000000 while i would expect 40000000

The issue seems to be at this line: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12900

It cannot produce a proper floating point constant for (float)(1<<31) doing (float)(1U <<b) solves the issue, and produces the correct result for vcvtq_n_s32_f32

The same goes for: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12912

More generally, there are inconsistencies when it comes to do generate bit masks via left shifts: This one does the right thing (similar to 1U) https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12992 But this one does not: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L13003

Zvictoria commented 3 years ago

Great thanks for this finding! Sorry for the long time to fix it. But it is done finally. Hope I could close this issue.

sensasonic commented 3 years ago

You're welcome, NEON_2_SSE.h has been super useful to me to develop/optimize/debug NEON code on my PC ! 👍