linbox-team / fflas-ffpack

FFLAS-FFPACK - Finite Field Linear Algebra Subroutines / Package
http://linbox-team.github.io/fflas-ffpack/
GNU Lesser General Public License v2.1
57 stars 23 forks source link

Fix Travis CI #322

Closed cyrilbouvier closed 3 years ago

cyrilbouvier commented 4 years ago

The Goal of the PR is to fix Travis CI. Currently, two out of the four jobs fail: https://travis-ci.org/github/linbox-team/fflas-ffpack/builds/741688422

First, commit 1569db4c51 should fix the homebrew problem with the OS X job (a similar changes was made in Givaro a few months ago).

The other problem is with the job using gcc 4.9 (see issue #313). To fix this issue, I see two possibilities:

  1. drop support of gcc 4.9, which is old (last release in 2016) and not supported anymore (https://gcc.gnu.org/gcc-4.9/)
  2. do not use any intrinsics not recognize by gcc 4.9. For example, for the first missing intrinsic _mm512_castps512_ps256, I was able to remove it by rewriting the blendv method in simd512_float.inl:
     static INLINE CONST vect_t blendv(const vect_t a, const vect_t b, const vect_t mask) {
    -
    -        __m256 lowa  = _mm512_castps512_ps256(a);
    -        __m256 higha = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(a),1));
    -
    -        __m256 lowb  = _mm512_castps512_ps256(b);
    -        __m256 highb = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(b),1));
    -
    -        __m256 lowmask  = _mm512_castps512_ps256(mask);
    -        __m256 highmask = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(mask),1));
    -
    -        __m256 reslow = _mm256_blendv_ps(lowa, lowb, lowmask);
    -        __m256 reshigh = _mm256_blendv_ps(higha, highb, highmask);
    -
    -        __m512 res = _mm512_castps256_ps512(reslow);
    -        res = _mm512_insertf32x8(res, reshigh, 1);
    -
    -        return res;
    +        return _mm512_mask_blend_ps (
    +                    _mm512_movepi32_mask (_mm512_castps_si512 (mask)), a, b);
     }

    The advantage of this new code is that it is faster (only two instructions, as the cast is here for compilation only). The main drawback is that _mm512_movepi32_mask requires AVX512DQ and not just AVX512L. But , according to issue #312 , other parts of the simd code already silently require AVX512DQ.

More work is needed to remove the other mising intrinsics if we go with option 2.

cyrilbouvier commented 4 years ago

Commit 1569db4 did fix the homebrew problem with the OS X job but revealed a failure for test-fger

The problem does not seem to be specific to OS X, as I was able to reproduce it. I had one failed run of test-fger on more than 100 runs, with the following error message:

Checking Modular_implem<int64_t, uint64_t, uint64_t> modulo 165033971... FAIL
a   :132720555
m   :44, n   : 1
incx :1, incy : 1, ldC : 1
Error C[0,0]=18463374 D[0,0]=24729322
Error C[1,0]=54479037 D[1,0]=144408618
Error C[2,0]=67881147 D[2,0]=99501009
Error C[3,0]=84658322 D[3,0]=141056375
Error C[4,0]=35823758 D[4,0]=44270387
Error C[5,0]=35369714 D[5,0]=15002300
Error C[6,0]=114908161 D[6,0]=42428348
Error C[7,0]=53084340 D[7,0]=15942759
Error C[8,0]=94996110 D[8,0]=104641692
Error C[9,0]=39583949 D[9,0]=143044872
Error C[10,0]=140921121 D[10,0]=136917392
Error C[11,0]=43433855 D[11,0]=74462683
Error C[12,0]=143759473 D[12,0]=89899069
Error C[13,0]=96683823 D[13,0]=75362525
Error C[14,0]=28275651 D[14,0]=78298271
Error C[15,0]=149033254 D[15,0]=138700481
Error C[16,0]=126252883 D[16,0]=66669534
Error C[17,0]=28226674 D[17,0]=51199424
Error C[18,0]=81191980 D[18,0]=144124893
Error C[19,0]=69444028 D[19,0]=61800979
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
FAILED 

Edit: I was able to find a seed that trigger the error:

./test-fger -s 355057717

Edit 2: It seems this issue is already under investigation: see issue #307 and PR #289

cyrilbouvier commented 4 years ago

A more exhaustive list of intrinsics not recognized by gcc 4.9:

ClementPernet commented 3 years ago

OK to merge this PR as is in order to fix the hombrew problem. Regarding the support for 4.9 as minimal gcc version, it seems that the numerous avx512 undefined intrinsics is a strong enough argument to move to gcc-5 as minimal supported gcc: we should not waste our time writing workarounds. Still the one you propose deserves its own PR, if it improves the current implementation. The fger bug is already mentionned in #307 and need our attention!

ClementPernet commented 3 years ago

LGTM . Merging