Closed yyctw closed 11 months ago
Looks like the msvc build also has compliants: https://ci.appveyor.com/project/nemequ/simde/builds/48267547/job/vgv72gurd4e0s202#L1856
Test errors on Fedora i386 (ignore the avx512 failures) https://download.copr.fedorainfracloud.org/results/packit/simd-everywhere-simde-1075/fedora-rawhide-i386/06522193-simde/builder-live.log.gz (source)
CircleCI got the x86 32 bit build finished, but experienced test failures: https://app.circleci.com/pipelines/github/simd-everywhere/simde/1139/workflows/b6f035be-5458-4865-b49c-fb22d4d49335/jobs/3138/parallel-runs/0/steps/0-112
Thank you @yyctw !
I attempted to build these two failing test cases using the "aarch64-linux-gnu-g++" toolchain with the same compile options that "circleci: i686-gcc11-O2" uses. However, I observed that these two test cases passed successfully on my x86 machine.
Upon further investigation, I noticed that the test cases only fail when built using the "i686-linux-gnu-g++-11" toolchain, while they pass when compiled with "i686-linux-gnu-gcc-11". I guess that there might be some issues or bugs in the "i686-linux-gnu-g++-11" toolchain when using the O2 optimization option.
Upon further investigation, I noticed that the test cases only fail when built using the "i686-linux-gnu-g++-11" toolchain, while they pass when compiled with "i686-linux-gnu-gcc-11". I guess that there might be some issues or bugs in the "i686-linux-gnu-g++-11" toolchain when using the O2 optimization option.
Yeah, this project often finds new compiler bugs. Can you report this bug to GCC? We'll need a workaround for the affected functions in SIMDe
Upon further investigation, I noticed that the test cases only fail when built using the "i686-linux-gnu-g++-11" toolchain, while they pass when compiled with "i686-linux-gnu-gcc-11". I guess that there might be some issues or bugs in the "i686-linux-gnu-g++-11" toolchain when using the O2 optimization option.
Yeah, this project often finds new compiler bugs. Can you report this bug to GCC? We'll need a workaround for the affected functions in SIMDe
Sure, I will report it as soon as possible.
Looks like the msvc build also has compliants: https://ci.appveyor.com/project/nemequ/simde/builds/48267547/job/vgv72gurd4e0s202#L1856
It appears that there are some bugs when expanding nested macros, such as SIMDE_CONSITIFY
and simde_mla_lane_*
. I've manually expanded SIMDE_CONSITIFY
and resolved the issue. Many other implementations, like {qd}mls{l}_lane
, have similar problems, and I will fix all of them as soon as possible.
It appears that there are some bugs when expanding nested macros, such as SIMDE_CONSITIFY and simde_mlalane*. I've manually expanded SIMDE_CONSITIFY and resolved the issue. Many other implementations, like {qq}mls{l}_lane, have similar problems, and I will fix all of them as soon as possible.
Yeah, the SIMDE_CONSTIFY_
macros work in the headers, but for MSVC they cause problems in the tests.
FYI, see https://github.com/simd-everywhere/implementation-status/commit/916de72538b89743016a3c411063e616fc33cf30 for the status of f16
type NEON intrinsics prior to this PR
(I updated the script that generates the implementation status, as it was ignoring functions that use 16-bit floating point types)
Upon further investigation, I noticed that the test cases only fail when built using the "i686-linux-gnu-g++-11" toolchain, while they pass when compiled with "i686-linux-gnu-gcc-11". I guess that there might be some issues or bugs in the "i686-linux-gnu-g++-11" toolchain when using the O2 optimization option.
Yeah, this project often finds new compiler bugs. Can you report this bug to GCC? We'll need a workaround for the affected functions in SIMDe
Sure, I will report it as soon as possible.
I found that this problem may be caused by variations in the precision of double across different processors [ref]. I resolved it by adding the -ffloat-store
flag in the i686-gcc-11-qemu.cross
file.
Looks like the msvc build also has compliants: https://ci.appveyor.com/project/nemequ/simde/builds/48267547/job/vgv72gurd4e0s202#L1856
It appears that there are some bugs when expanding nested macros, such as
SIMDE_CONSITIFY
andsimde_mla_lane_*
. I've manually expandedSIMDE_CONSITIFY
and resolved the issue. Many other implementations, like{qd}mls{l}_lane
, have similar problems, and I will fix all of them as soon as possible.
Solved.
I found that this problem may be caused by variations in the precision of double across different processors [ref]. I resolved it by adding the
-ffloat-store
flag in thei686-gcc-11-qemu.cross
file.This is a good workaround to document in the README for x86 (32-bit) users, but it is still a compiler bug if different
-O
optimizations levels produce different math. So we'll need to get a minimal reproducer and file a bug with GCC. Hopefully the failing tests cases will make developing a minimal reproducer easier. Let me know if you need help with that.As for a workaround, perhaps one of the following applied only for the problematic GCC versions will help: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-optimize-function-attribute https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-sseregparm-function-attribute_002c-x86 https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-target-function-attribute-5 with one or more of
no-mmx
,no-fancy-math-387
,fpmath=sse
If you want to get the bulk of this merged first, feel free to open a new PR that skips the functions that triggers the compiler bug. Then this PR can be rebased and kept until we implement a workaround.
Sure, I'll start by opening a new PR without the functions that trigger the compile errors. After that, I'll report this compilation bug to GCC and look for a workaround for SIMDe.
@yyctw Now that https://github.com/simd-everywhere/simde/pull/1081 is merged,: do you want to keep this PR to develop the workaround, or will you open a new one?
@yyctw Now that #1081 is merged,: do you want to keep this PR to develop the workaround, or will you open a new one?
I'll open the new one, this PR can be closed.
Hi all, this is Eric from Andes Technology Corporation. This PR includes
Add the types simde_float16x4x{3/4}_t and simde_float16x8x{3/4}_t Add 351 initial implementations and corresponding test cases in 63 families which are listed below:
abal
,abal_high
,cale
,calt
,create
,cvt
,cvt_n
,cvtn
,dup_lane
,ext
fma
,fma_lane
,fma_n
,fms
,fms_lane
,fms_n
,get_lane
ld1_dup
,ld1_lane
,ld1_x2
,ld1_x3
,ld1_x4
,ld1q_x2
,ld1q_x3
,ld1q_x4
ld2
,ld2_dup
,ld2_lane
,ld3
,ld3_dup
,ld3_lane
,ld4
,ld4_dup
,ld4_lane
mla_lane
,mlal_high_lane
,mls_lane
,mlsl_high_lane
,mul_lane
,neg
qdmlal
,qdmlal_high
,qdmlal_high_lane
,qdmlal_high_n
,qdmlal_lane
,qdmlal_n
qdmlsl
,qdmlsl_high
,qdmlsl_high_lane
,qdmlsl_high_n
,qdmlsl_lane
,qdmlsl_n
qdmull
,qdmull_high
,qdmull_high_lane
,qdmull_high_n
,qdmull_lane
,qdmull_n
qdmulh
,qdmulh_lane
,qshl
,reinterpret
,sqrt
"macOS (version 14.2, macos-13)" was the only test that failed on my fork, and it occurred during the "Install Homebrew Dependencies" stage, but all the other CI tests passed smoothly. Thanks for reading and any recommendations are welcome!