Open tdulcet opened 8 months ago
I just had the assertion fail on a 64-bit ARM ASIMD build on Linux using GCC, while doing P-1 stage 1. Here was the output:
pm1_set_bounds: Stage 2 needs at least 24+5 buffers ... each buffer needs 60 MB; avail-RAM allows 83 such.
Setting default p-1 stage bounds b1 = 1200000, b2_start = 1200000, b2 = 43000000.
INFO: precomputed p-1 stage 1 primes-product file p143809313.s1_prod not found...computing from scratch.
INFO: Opened precomputed p-1 stage 1 primes-product file p143809313.s1_prod for writing...
Product of Stage 1 prime powers with b1 = 1200000 is 1733006 bits (27079 limbs), vs estimated 1776540. Setting PRP_BASE = 3.
INFO: Maximum recommended exponent for FFT length (7680 Kdbl) = 144258522; p[ = 143809313]/pmax_rec = 0.9968860834.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: primary restart file p143809313 not found...looking for secondary...
INFO: no restart file found...starting run from scratch.
Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
INFO: Maximum recommended exponent for FFT length (8192 Kdbl) = 153654913; p[ = 143809313]/pmax_rec = 0.9359239493.
Initial DWT-multipliers chain length = [long] in carry step.
INFO: restart file p143809313 found...reading...
INFO: Maximum recommended exponent for FFT length (7680 Kdbl) = 144258522; p[ = 143809313]/pmax_rec = 0.9968860834.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: restart file p143809313 found...reading...
ERROR: at line 1772 of file ../src/radix240_ditN_cy_dif1.c
Assertion failed: thread-local memcheck fail!
and the end of the .stat
file:
[2024-10-11 12:51:33] M143809313 S1 bit = 650000 [37.51% complete] clocks = 00:27:22.232 [164.2233 msec/iter] Res64: 756983F0EC21CEF4. AvgMaxErr = 0.254025244. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-11 13:18:58] M143809313 S1 bit = 660000 [38.08% complete] clocks = 00:27:24.684 [164.4684 msec/iter] Res64: AC16C822BE86A8A0. AvgMaxErr = 0.253847552. MaxErr = 0.343750000. Residue shift count = 0.
[2024-10-11 13:46:20] M143809313 S1 bit = 670000 [38.66% complete] clocks = 00:27:22.261 [164.2261 msec/iter] Res64: A89903C987BC9131. AvgMaxErr = 0.253887622. MaxErr = 0.375000000. Residue shift count = 0.
M143809313 Roundoff warning on iteration 671694, maxerr = 0.437500000000
Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
Restarting M143809313 at iteration = 670000. Res64: A89903C987BC9131, residue shift count = 0
M143809313: using FFT length 8192K = 8388608 8-byte floats, initial residue shift count = 0
This gives an average 17.143406033515930 bits per digit
Using complex FFT radices 16 16 16 32 32
[2024-10-11 14:18:37] M143809313 S1 bit = 680000 [39.24% complete] clocks = 00:27:37.211 [165.7211 msec/iter] Res64: E9E6A7F0881D03A4. AvgMaxErr = 0.049706465. MaxErr = 0.066406250. Residue shift count = 0.
Restarting M143809313 at iteration = 680000. Res64: E9E6A7F0881D03A4, residue shift count = 0
M143809313: using FFT length 7680K = 7864320 8-byte floats, initial residue shift count = 0
This gives an average 18.286299769083659 bits per digit
Using complex FFT radices 240 16 32 32
This was using Mlucas commit 512a544ed7237e8c0634dbdaca9d141c3e7c87c3.
Edit: The issue occurred again after restarting the test:
[2024-10-21 05:59:50] M143809313 S1 bit = 1660000 [95.79% complete] clocks = 00:27:35.471 [165.5472 msec/iter] Res64: 004CFCC23FEB13CE. AvgMaxErr = 0.253917258. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-21 06:27:28] M143809313 S1 bit = 1670000 [96.36% complete] clocks = 00:27:37.656 [165.7657 msec/iter] Res64: 7742BCA0A0B3CD57. AvgMaxErr = 0.253864299. MaxErr = 0.375000000. Residue shift count = 0.
[2024-10-21 06:55:04] M143809313 S1 bit = 1680000 [96.94% complete] clocks = 00:27:35.334 [165.5334 msec/iter] Res64: B7569BD0EF65629D. AvgMaxErr = 0.253812652. MaxErr = 0.343750000. Residue shift count = 0.
M143809313 Roundoff warning on iteration 1688848, maxerr = 0.437500000000
Switching to next-larger available FFT length 8192K and restarting from last checkpoint file.
Restarting M143809313 at iteration = 1680000. Res64: B7569BD0EF65629D, residue shift count = 0
M143809313: using FFT length 8192K = 8388608 8-byte floats, initial residue shift count = 0
This gives an average 17.143406033515930 bits per digit
Using complex FFT radices 16 16 16 32 32
[2024-10-21 07:47:18] M143809313 S1 bit = 1690000 [97.52% complete] clocks = 00:27:47.450 [166.7451 msec/iter] Res64: 48B4E7D7C7425471. AvgMaxErr = 0.049662011. MaxErr = 0.062500000. Residue shift count = 0.
Restarting M143809313 at iteration = 1690000. Res64: 48B4E7D7C7425471, residue shift count = 0
M143809313: using FFT length 7680K = 7864320 8-byte floats, initial residue shift count = 0
This gives an average 18.286299769083659 bits per digit
Using complex FFT radices 240 16 32 32
Error with native Windows builds compiled using MSYS2:
The issue seems to affect many FFT lengths, particularly the powers of 2 lengths, such as 16K, 32K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, 8M, 16M, 32M, etc. Interestingly, when Mlucas is run in GDB or compiled with ASan and UBSan enabled, this error usually goes away, so it is difficult to debug. It must be more Windows compatibility issues, similar to the datatype issues I fixed in https://github.com/tdulcet/Mlucas/commit/011549e027777577e5fd119145e852a612b8d4ba. It currently prevents users from being able to use the native Windows builds.
Edit: We got a report that it works as expected on 32-bit ARM.
Edit 2: Ken was able to further debug the issue and noted:
See this post and the attachment for an exhaustive list of all FFT lengths that failed.