Closed yyctw closed 9 months ago
As for a workaround, perhaps one of the following applied only for the problematic GCC versions will help: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-optimize-function-attribute https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-sseregparm-function-attribute_002c-x86 https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-target-function-attribute-5 with one or more of no-mmx, no-fancy-math-387, fpmath=sse
@mr-c I have attempted all three methods listed above, but none of them worked effectively. Additionally, I tried using _Pragma("GCC push_options")
and _Pragma("GCC optimize \"-ffloat-store\"")
in specific functions, yet because G++ ignores #pragma optimize
(ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48026 ), this approach also failed.
Maybe we should directly add the -ffloat-store
option in the cross-file?
Maybe we should directly add the
-ffloat-store
option in the cross-file?
Yes, please try that here
Maybe we should directly add the
-ffloat-store
option in the cross-file?
Because directly adding the -ffloat-store
option in the cross-file would affect other series of SIMD intrinsics (such as x86, MIPS, WebAssembly), I've added it to the meson.build file (located at test/arm/neon/) when detecting Intel CPUs.
Huh, this PR didn't increment the NEON stats.. https://github.com/simd-everywhere/implementation-status/commit/8bfebcf01eafa60b58af7e2a3921410dd88f93b7
Hi all, this is Eric from Andes Technology Corporation. This PR includes
vcvtq_n_f64_u64
andvmlaq_laneq_f32
, triggering the i686 compiler error in the previous PR.After reading GCC bug 323, I discovered that this is not a compiler bug; it is caused by exceeding double-precision floating-point precision on x86 machines, which use
80 bits
to represent double-precision floating-point numbers. The workaround provided by GCC is either to change the rounding precision in the FPCR or to work around the problem with-ffloat-store
.And according to the https://github.com/simd-everywhere/simde/pull/1075#pullrequestreview-1681523665, which one would be better implementation?
Thanks for your reading!