quarkslab / NFLlib

NTT-based Fast Lattice library
MIT License
165 stars 52 forks source link

Fix bug AVX #26

Closed tlepoint closed 6 years ago

tlepoint commented 7 years ago

The AVX code was never compiled because of a wrong define, and when correctly included did not compile.

However, I still get an error:

        Start   2: run_nfllib_demo8_60_uint32_t_op
  2/122 Test   #2: run_nfllib_demo8_60_uint32_t_op .....................***Failed    0.03 sec

When running the test, I get the following output:

Polynomials of degree 8 with 60 bit coefficients and 32 bit limbs
======================================================================
Time per polynomial setcoeffs: 0.675 us
Time per polynomial generation (uniform): 3.25 us
Time per polynomial generation (bounded 6 bits): 0.25 us
Time per polynomial generation (gaussian sigma=20 k=128): 1.6 us
Time per polynomial NTT: 0 us
Time per polynomial inverse NTT: 0.1 us
Time per polynomial in-place addition a+=b: 0.2 us
Time per polynomial addition c=a+b: 0 us
Time per polynomial subtraction c=a-b: 0 us
Time per polynomial multiplication (NTT form) c=a*b: 0.1 us
error with vectorized mul_shoup

@aguinet @serge-sans-paille ?

aguinet commented 7 years ago

Hmm it also fails for me when using SSE-only. Am I the only one?

aguinet commented 7 years ago

Using cmake -DNFLLIB_USE_AVX=OFF -DNFLLIB_USE_SSE=ON -DNFL_OPTIMIZED=ON -DCMAKE_BUILD_TYPE=Debug -G Ninja ..

aguinet commented 7 years ago

Okay I think I got the issue, somehow one refactoring must have failed... I'll keep you in touch!

aguinet commented 7 years ago

I pushed two commits that I think fix the issues (for SSE and AVX2)!

aguinet commented 7 years ago

Well, it seems that there is still an aligment issue for AVX2! I will check that this afternoon.

tlepoint commented 7 years ago

@aguinet that works for me, can we merge?

carlosaguilarmelchor commented 6 years ago

Merging it