simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.32k stars 239 forks source link

[NEON] Add the functions which will trigger the i686 compiler error. #1101

Closed yyctw closed 9 months ago

yyctw commented 10 months ago

Hi all, this is Eric from Andes Technology Corporation. This PR includes vcvtq_n_f64_u64 and vmlaq_laneq_f32, triggering the i686 compiler error in the previous PR.

After reading GCC bug 323, I discovered that this is not a compiler bug; it is caused by exceeding double-precision floating-point precision on x86 machines, which use 80 bits to represent double-precision floating-point numbers. The workaround provided by GCC is either to change the rounding precision in the FPCR or to work around the problem with -ffloat-store.

And according to the https://github.com/simd-everywhere/simde/pull/1075#pullrequestreview-1681523665, which one would be better implementation?

As for a workaround, perhaps one of the following applied only for the problematic GCC versions will help: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-optimize-function-attribute https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-sseregparm-function-attribute_002c-x86 https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-target-function-attribute-5 with one or more of no-mmx, no-fancy-math-387, fpmath=sse

Thanks for your reading!

yyctw commented 9 months ago

As for a workaround, perhaps one of the following applied only for the problematic GCC versions will help: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-optimize-function-attribute https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-sseregparm-function-attribute_002c-x86 https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-target-function-attribute-5 with one or more of no-mmx, no-fancy-math-387, fpmath=sse

@mr-c I have attempted all three methods listed above, but none of them worked effectively. Additionally, I tried using _Pragma("GCC push_options") and _Pragma("GCC optimize \"-ffloat-store\"") in specific functions, yet because G++ ignores #pragma optimize (ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48026 ), this approach also failed. Maybe we should directly add the -ffloat-store option in the cross-file?

mr-c commented 9 months ago

Maybe we should directly add the -ffloat-store option in the cross-file?

Yes, please try that here

yyctw commented 9 months ago

Maybe we should directly add the -ffloat-store option in the cross-file?

Because directly adding the -ffloat-store option in the cross-file would affect other series of SIMD intrinsics (such as x86, MIPS, WebAssembly), I've added it to the meson.build file (located at test/arm/neon/) when detecting Intel CPUs.

mr-c commented 9 months ago

Huh, this PR didn't increment the NEON stats.. https://github.com/simd-everywhere/implementation-status/commit/8bfebcf01eafa60b58af7e2a3921410dd88f93b7