linbox-team / linbox

LinBox - C++ library for exact, high-performance linear algebra
https://linbox-team.github.io/linbox
GNU Lesser General Public License v2.1
81 stars 28 forks source link

SIMD bug on ARM64 #215

Closed ClementPernet closed 5 years ago

ClementPernet commented 5 years ago

CUPID detects SSE2 instruction set, but arm_neon.h, included as the intrinsic header in fflas-ffpack/utils/fflas_intrinsic.h does not declare it. I have not idea whether one can hope to get the fflas SIMD to work on ARM_NEON, and how to do so. If nobody steps up, I suggest to auto-disabel SIMD on ARM architectures. See https://trac.sagemath.org/ticket/26932#comment:22

cyrilbouvier commented 5 years ago

I read the Trac ticket and the buggy log at https://nix-cache.s3.amazonaws.com/log/ifhzq8jqhcrz81y75cvxz9rs6dws5hny-linbox-1.6.0.drv (copied here in case it changes) and I see that SSE2 was not detected:

-----------------------------------------------
        START  LINBOX CONFIG                   
-----------------------------------------------
Detecting SIMD instruction set
SSE disabled
SSE2 disabled
SSE3 disabled
SSSE3 disabled
SSE4.1 disabled
SSE4.2 disabled
AVX disabled
AVX2 disabled
FMA3 disabled
FMA4 disabled

(configure is called with --disable-optimization)

I think that the problem is in the file linbox/algorithms/polynomial-matrix/polynomial-fft-transform.h where a _m128i is used (line 364 for linbox 1.6.0) in a function prototype but no check is performed to see if SSE2 is available.

ClementPernet commented 5 years ago

Thanks Cyril, I got confused between the 2 logs. The one on aarch64 indeed does not mention any SSE detected. Then you're right that it is just a problem in linbox/algorithms/polynomial-matrix/polynomial-fft-transform.h. I'll fix it right now.

ClementPernet commented 5 years ago

done in dd17635a8676794fdaaafe060aaabb91f60bfac4

ClementPernet commented 5 years ago

The fix seem to have solved the issue: https://trac.sagemath.org/ticket/26932#comment:24