Open efvb opened 5 months ago
Thanks for the bug report!
As with #10, it would be helpful if you could try disabling LTO, so we could rule that out.
If that does not help, could you add this printf call:
printf("c_uint64_ptr[j] = %llu\n\nj = %u\n", c_uint64_ptr[j], j);
above that assert statement on line 2157: https://github.com/primesearch/Mlucas/blob/18398583da19f270eed22a036c27d9b6beb9973d/src/Mlucas.c#L2154-L2157 This might provide us more information about why the assert statement is failing.
You might also want to try forcing a larger FFT length with the -fft
option, as this is very close to the limit and could be hitting a roundoff error (ROE). Would you mind sharing the .stat
file for this exponent, which would show the ROE values.
Hi Stat file attached. I will recompile and retry over the weekend.
Now compiled without -LTO and with the printf line added, same issue but more info. (edited)
Mlucas 21.0.1
https://www.mersenneforum.org/mayer/README.html
INFO: testing qfloat routines... System total RAM = 23236, free RAM = 2102 INFO: 2102 MB of free system RAM detected. CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 11.4.1 20230605 (Red Hat 11.4.1-2.1.0.1). INFO: Build uses ARMv8 advanced-SIMD instruction set. INFO: Using prefetch. INFO: Using inline-macro form of MUL_LOHI64. INFO: Using FMADD-based 100-bit modmul routines for factoring. INFO: MLUCAS_PATH is set to "" INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. INFO: testing IMUL routines... INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs... INFO: System has 4 available processor cores. INFO: testing FFT radix tables... Set affinity for the following 4 cores: 0.1.2.3. User did not set LowMem in mlucas.ini ... allowing all test types. User did not set CheckInterval in mlucas.ini ... using default. NTHREADS = 4 Setting ITERS_BETWEEN_CHECKPOINTS = 10000. looking for worktodo.txt file... worktodo.txt file found...reading next assignment... worktodo.txt entry: PRP=16A20FCE71DF730C49E6B299E558CC11,1,2,11396863,-1,99,0,3,5,"61547191186328636909209"
INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395. Initial DWT-multipliers chain length = [hiacc] in carry step. INFO: restart file p11396863 found...reading... Computing 2990000-squaring residue R (mod known prime q = 61547191186328636909209) A: R == 44566622423702911301179 (mod q) B: R == 44566622423702911301179 (mod q) mers_mod_square: Init threadpool of 4 threads Using 4 threads in carry step
c_uint64_ptr[j] = 1
j = 178076 ERROR: at line 2159 of file ../src/Mlucas.c Assertion failed: d[]*PRP_BASE result has unexpected carryout!
which -fft value should I try?
Thank you for the additional information. From the provided .stat
file, I see the max ROE before this failure was 0.3125, which should be OK.
Now compiled without -O3
Sorry, I meant building it without Link Time Optimization (LTO) enabled, so without the -flto=auto
flag. It may be easier to just run the makemake.sh
script, which should automatically use the correct (known stable) compiler options for the system.
which -fft value should I try?
The next largest one listed in your mlucas.cfg
file, most likely 640K, so try with -fft 640
.
CC: @xanthe-cat
@tdulcet, I really meant to write without LTO instead of 03. I used this line CFLAGS = -fdiagnostics-color -Wall -g -O3 -mcpu=neoverse-n1 -DUSE_ARM_V8_SIMD # -flto=auto
I used -fft 640 and same error happened.
I will try a run from the beginning without LTO to reproduce the issue.
same issue from a fresh run without LTO.
... INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395. Initial DWT-multipliers chain length = [hiacc] in carry step. INFO: restart file p11396863 found...reading... Computing 5990000-squaring residue R (mod known prime q = 61547191186328636909209) A: R == 26966841112747424620161 (mod q) B: R == 26966841112747424620161 (mod q) mers_mod_square: Init threadpool of 4 threads Using 4 threads in carry step c_uint64_ptr[j] = 1
j = 178076 ERROR: at line 2159 of file ../src/Mlucas.c Assertion failed: d[]*PRP_BASE result has unexpected carryout!
Thanks for running it again and for the information.
Another user reported a similar issue today when performing a regular PRP test. Here was their provided output:
Mlucas 20.1.1
http://www.mersenneforum.org/mayer/README.html
INFO: testing qfloat routines...
INFO: 32768 MB of available system RAM detected.
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 15.0.0 (clang-1500.3.9.4).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 10 available processor cores.
INFO: testing FFT radix tables...
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
looking for worktodo.txt file...
worktodo.txt file found...reading next assignment...
worktodo.txt entry: PRP=3425336B523EA2F0C86FD2E9670FE12F,1,2,127979711,-1,77,0
INFO: Maximum recommended exponent for FFT length (7168 Kdbl) = 134847983; p[ = 127979711]/pmax_rec = 0.9490665574.
Initial DWT-multipliers chain length = [long] in carry step.
INFO: restart file p127979711 found...reading...
mers_mod_square: Init threadpool of 1 threads
Using 1 threads in carry step
ERROR: at line 2157 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!
and the end of their .stat
file for this exponent:
[2024-07-01 21:16:23] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:09.197 [ 48.9198 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103215234. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average 17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices 224 32 32 16
[2024-07-03 08:07:25] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:14.202 [ 49.4202 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average 17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices 224 32 32 16
[2024-07-03 11:25:23] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:17.074 [ 49.7075 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average 17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices 224 32 32 16
[2024-07-03 12:20:21] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:12.683 [ 49.2684 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.
Forcing a larger FFT length also did not help:
% ./Mlucas -fft 8192
Mlucas 20.1.1
http://www.mersenneforum.org/mayer/README.html
INFO: testing qfloat routines...
INFO: 32768 MB of available system RAM detected.
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 15.0.0 (clang-1500.3.9.4).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 10 available processor cores.
INFO: testing FFT radix tables...
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
looking for worktodo.txt file...
worktodo.txt file found...reading next assignment...
worktodo.txt entry: PRP=3425336B523EA2F0C86FD2E9670FE12F,1,2,127979711,-1,77,0
INFO: Maximum recommended exponent for FFT length (7168 Kdbl) = 134847983; p[ = 127979711]/pmax_rec = 0.9490665574.
Initial DWT-multipliers chain length = [long] in carry step.
INFO: restart file p127979711 found...reading...
mers_mod_square: Init threadpool of 1 threads
Using 1 threads in carry step
ERROR: at line 2157 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!
Both of the effected users are using ARM systems with ASIMD.
I am using code from main, May 5th, compiled with GCC 11 with tuned Makefile with -O3 -flto=auto -DUSE_ARM_V8_SIMD -mcpu=neoverse-n1 This is an Ampere Arm VM.
I also have the file to restart, if needed.
INFO: testing qfloat routines... System total RAM = 23236, free RAM = 11262 INFO: 11262 MB of free system RAM detected. CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 11.4.1 20230605 (Red Hat 11.4.1-2.1.0.1). INFO: Build uses ARMv8 advanced-SIMD instruction set. INFO: Using prefetch. INFO: Using inline-macro form of MUL_LOHI64. INFO: Using FMADD-based 100-bit modmul routines for factoring. INFO: MLUCAS_PATH is set to "" INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. INFO: testing IMUL routines... INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs... INFO: System has 4 available processor cores. INFO: testing FFT radix tables... Set affinity for the following 4 cores: 0.1.2.3. User did not set LowMem in mlucas.ini ... allowing all test types. User did not set CheckInterval in mlucas.ini ... using default. NTHREADS = 4 Setting ITERS_BETWEEN_CHECKPOINTS = 10000. looking for worktodo.txt file... worktodo.txt file found...reading next assignment... worktodo.txt entry: PRP=16A20FCE71DF730C49E6B299E558CC11,1,2,11396863,-1,99,0,3,5,"61547191186328636909209"
INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395. Initial DWT-multipliers chain length = [hiacc] in carry step. INFO: restart file p11396863 found...reading... Computing 2990000-squaring residue R (mod known prime q = 61547191186328636909209) A: R == 44566622423702911301179 (mod q) B: R == 44566622423702911301179 (mod q) mers_mod_square: Init threadpool of 4 threads Using 4 threads in carry step ERROR: at line 2157 of file ../src/Mlucas.c Assertion failed: d[]*PRP_BASE result has unexpected carryout!
[1]+ Exit 1 ./Mlucas -cpu 0:3