primesearch / Mlucas

Ⓜ️ Ernst Mayer's Mlucas and Mfactor programs for GIMPS
https://mersenneforum.org/mayer/README.html
GNU General Public License v3.0
8 stars 2 forks source link

Error with native Windows builds: `Assertion failed: thread-local memcheck fail!` #8

Open tdulcet opened 8 months ago

tdulcet commented 8 months ago

Error with native Windows builds compiled using MSYS2:

INFO: Maximum recommended exponent for FFT length (16 Kdbl) = 343407; p[ = 341749]/pmax_rec = 0.9951719097.
Initial DWT-multipliers chain length = [hiacc] in carry step.
M341749: using FFT length 16K = 16384 8-byte floats, initial residue shift count = 341536
This gives an average   20.858703613281250 bits per digit
Using complex FFT radices        16        16        32
mers_mod_square: Init threadpool of 1 threads
executing worker thread id 0, #cores = 12
ERROR: at line 1444 of file ../src/radix16_ditN_cy_dif1.c
Assertion failed: thread-local memcheck fail!

The issue seems to affect many FFT lengths, particularly the powers of 2 lengths, such as 16K, 32K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, 8M, 16M, 32M, etc. Interestingly, when Mlucas is run in GDB or compiled with ASan and UBSan enabled, this error usually goes away, so it is difficult to debug. It must be more Windows compatibility issues, similar to the datatype issues I fixed in https://github.com/tdulcet/Mlucas/commit/011549e027777577e5fd119145e852a612b8d4ba. It currently prevents users from being able to use the native Windows builds.

Edit: We got a report that it works as expected on 32-bit ARM.

Edit 2: Ken was able to further debug the issue and noted:

FFT lengths that repeat a radix0 fail on the second same-radix0 pass.

See this post and the attachment for an exhaustive list of all FFT lengths that failed.

tdulcet commented 1 month ago

I just had the assertion fail on a 64-bit ARM ASIMD build on Linux using GCC, while doing P-1 stage 1. Here was the output:

pm1_set_bounds: Stage 2 needs at least 24+5 buffers ... each buffer needs 60 MB; avail-RAM allows 83 such.
Setting default p-1 stage bounds b1 = 1200000, b2_start = 1200000, b2 = 43000000.
INFO: precomputed p-1 stage 1 primes-product file p143809313.s1_prod not found...computing from scratch.
INFO: Opened precomputed p-1 stage 1 primes-product file p143809313.s1_prod for writing...
Product of Stage 1 prime powers with b1 = 1200000 is 1733006 bits (27079 limbs), vs estimated 1776540. Setting PRP_BASE = 3.
INFO: Maximum recommended exponent for FFT length (7680 Kdbl) = 144258522; p[ = 143809313]/pmax_rec = 0.9968860834.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: primary restart file p143809313 not found...looking for secondary...
INFO: no restart file found...starting run from scratch.
 Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
INFO: Maximum recommended exponent for FFT length (8192 Kdbl) = 153654913; p[ = 143809313]/pmax_rec = 0.9359239493.
Initial DWT-multipliers chain length = [long] in carry step.
 INFO: restart file p143809313 found...reading...
INFO: Maximum recommended exponent for FFT length (7680 Kdbl) = 144258522; p[ = 143809313]/pmax_rec = 0.9968860834.
Initial DWT-multipliers chain length = [hiacc] in carry step.
 INFO: restart file p143809313 found...reading...
ERROR: at line 1772 of file ../src/radix240_ditN_cy_dif1.c
Assertion failed: thread-local memcheck fail!

and the end of the .stat file:

[2024-10-11 12:51:33] M143809313 S1 bit = 650000 [37.51% complete] clocks = 00:27:22.232 [164.2233 msec/iter] Res64: 756983F0EC21CEF4. AvgMaxErr = 0.254025244. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-11 13:18:58] M143809313 S1 bit = 660000 [38.08% complete] clocks = 00:27:24.684 [164.4684 msec/iter] Res64: AC16C822BE86A8A0. AvgMaxErr = 0.253847552. MaxErr = 0.343750000. Residue shift count = 0.
[2024-10-11 13:46:20] M143809313 S1 bit = 670000 [38.66% complete] clocks = 00:27:22.261 [164.2261 msec/iter] Res64: A89903C987BC9131. AvgMaxErr = 0.253887622. MaxErr = 0.375000000. Residue shift count = 0.
M143809313 Roundoff warning on iteration   671694, maxerr =   0.437500000000
 Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
Restarting M143809313 at iteration = 670000. Res64: A89903C987BC9131, residue shift count = 0
M143809313: using FFT length 8192K = 8388608 8-byte floats, initial residue shift count = 0
This gives an average   17.143406033515930 bits per digit
Using complex FFT radices        16        16        16        32        32
[2024-10-11 14:18:37] M143809313 S1 bit = 680000 [39.24% complete] clocks = 00:27:37.211 [165.7211 msec/iter] Res64: E9E6A7F0881D03A4. AvgMaxErr = 0.049706465. MaxErr = 0.066406250. Residue shift count = 0.
Restarting M143809313 at iteration = 680000. Res64: E9E6A7F0881D03A4, residue shift count = 0
M143809313: using FFT length 7680K = 7864320 8-byte floats, initial residue shift count = 0
This gives an average   18.286299769083659 bits per digit
Using complex FFT radices       240        16        32        32

This was using Mlucas commit 512a544ed7237e8c0634dbdaca9d141c3e7c87c3.

Edit: The issue occurred again after restarting the test:

[2024-10-21 05:59:50] M143809313 S1 bit = 1660000 [95.79% complete] clocks = 00:27:35.471 [165.5472 msec/iter] Res64: 004CFCC23FEB13CE. AvgMaxErr = 0.253917258. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-21 06:27:28] M143809313 S1 bit = 1670000 [96.36% complete] clocks = 00:27:37.656 [165.7657 msec/iter] Res64: 7742BCA0A0B3CD57. AvgMaxErr = 0.253864299. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-21 06:55:04] M143809313 S1 bit = 1680000 [96.94% complete] clocks = 00:27:35.334 [165.5334 msec/iter] Res64: B7569BD0EF65629D. AvgMaxErr = 0.253812652. MaxErr = 0.343750000. Residue shift count = 0.
M143809313 Roundoff warning on iteration  1688848, maxerr =   0.437500000000
 Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
Restarting M143809313 at iteration = 1680000. Res64: B7569BD0EF65629D, residue shift count = 0
M143809313: using FFT length 8192K = 8388608 8-byte floats, initial residue shift count = 0
This gives an average   17.143406033515930 bits per digit
Using complex FFT radices        16        16        16        32        32
[2024-10-21 07:47:18] M143809313 S1 bit = 1690000 [97.52% complete] clocks = 00:27:47.450 [166.7451 msec/iter] Res64: 48B4E7D7C7425471. AvgMaxErr = 0.049662011. MaxErr = 0.062500000. Residue shift count = 0.
Restarting M143809313 at iteration = 1690000. Res64: 48B4E7D7C7425471, residue shift count = 0
M143809313: using FFT length 7680K = 7864320 8-byte floats, initial residue shift count = 0
This gives an average   18.286299769083659 bits per digit
Using complex FFT radices       240        16        32        32