Closed sensasonic closed 3 years ago
Great thanks for this finding! Sorry for the long time to fix it. But it is done finally. Hope I could close this issue.
You're welcome, NEON_2_SSE.h has been super useful to me to develop/optimize/debug NEON code on my PC ! 👍
Hi,
When doing the following test:
it produces the following result:
0.500000 -> C0000000
while i would expect 40000000The issue seems to be at this line: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12900
It cannot produce a proper floating point constant for (float)(1<<31) doing (float)(1U <<b) solves the issue, and produces the correct result for vcvtq_n_s32_f32
The same goes for: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12912
More generally, there are inconsistencies when it comes to do generate bit masks via left shifts: This one does the right thing (similar to 1U) https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L12992 But this one does not: https://github.com/intel/ARM_NEON_2_x86_SSE/blob/67ea94281d8726619e3bc95e94ec57cc8d61efb7/NEON_2_SSE.h#L13003