primesearch / Mlucas

Ⓜ️ Ernst Mayer's Mlucas and Mfactor programs for GIMPS
https://mersenneforum.org/mayer/README.html
GNU General Public License v3.0
8 stars 2 forks source link

Fermat-testing.md #1

Closed xanthe-cat closed 5 months ago

xanthe-cat commented 8 months ago

Introduction to Fermat number testing with Mlucas

Mlucas is a powerful Mersenne primality testing and factoring package which supports Lucas–Lehmer testing, Fermat probable primality testing, and Pollard p–1 factorisation; justly, most applications of the software have concentrated on examination of the Mersenne numbers, however Mlucas is also capable of testing the primality of Fermat numbers, testing the compositeness of Fermat cofactors, and running p–1 factorisation on Fermat numbers. This post and others hope to fill in several lacunae in the Mlucas documentation with respect to Fermat number testing.

Mlucas build notes

The Fermat modular squaring code is architecture-specific, ideally requiring Mlucas to be built with one of the SIMD modes (AVX, AVX2, SSE2, etc). This should be handled automatically by the makemake.sh script for most hardware, including x86, however if Fermat modular squaring is not natively supported, there may be workarounds available.

For example, Apple Silicon is ASIMD, meaning a native Mlucas build cannot test Fermat numbers. The answer is to build an SSE2 Mlucas on an Intel-based Mac, which should be capable of running under Rosetta emulation on Apple Silicon, at the penalty of a significant performance hit for larger exponents.

(When compiling on Intel with the intention of building a compatible version for Apple Silicon, the Makefile requires static linking of the GMP library, by the following alteration to the linker flags: LDLIBS = -Bstatic ${LD_ARGS[@]} -Bdynamic Static linking of libraries is usually not recommended, and this may not be a solution in all possible cases.)

Fermat self-testing and populating fermat.cfg

Assuming you have a working build, Mlucas allows any Fermat number Fm = 22m + 1 with exponent m in the range [14,63] to be entered as an argument for self-testing, by using the -f flag like so:

./Mlucas -f <exponent> -iters <number> [-fft <fft_length>]

However, the software only provides fast Fourier transforms (FFTs) up to 512M in length, which further limits the largest Fermat number that may be tested, up to F33.

The -iters flag is mandatory and should be followed by a number less than a million; the recommended values for -iters to be set to are 100, 1000, or 10000, since the self-testing code has correct residues for all Fermat numbers [14,33] saved, and any error in the test result will be immediately discovered.

Mlucas does not support Fermat testing for all possible FFT lengths. Generally, FFTs for testing Mersenne numbers are available for any length k·2n K, where k = [8,15], n ≥ 0. In the case of Fermat numbers however, only the subset k = {8, 14, 15} supports Fermat testing, along with k = 63 when n ≥ 4.

When self-testing Mlucas for Mersenne numbers, a command such as ./Mlucas -s all configures Mlucas for the majority of FFT lengths and saves the results in mlucas.cfg. A similar process must be carried out for the Fermat numbers, using the ./Mlucas -f command above, which saves results in the file fermat.cfg.

A script config-fermat.sh has been written to automate the generation of the fermat.cfg file using the set of possible FFT lengths. After the license, the next few lines declare some variables which may be customised as desired (for example, 10000 iterations will require significant runtime at the largest possible FFT sizes):

# Mlucas
MLUCAS=./Mlucas

# Number of iterations
# use 100, 1000, or 10000 to match pre-computed values
ITERS=100

# Minimum Fermat number (14 or greater)
MIN=14

# Maximum Fermat number (33 or less)
MAX=29

The script should be invoked from the build directory, typically obj, with the following command, which may include as parameters any cpu-specific options, such as -core or -cpu:

../config-fermat.sh -cpu 0:3

Once self-testing has occurred, production work on Fermat numbers may be carried out using a worktodo file (Mlucas 20.x: worktodo.ini; Mlucas 21: worktodo.txt) with entries in one of the several formats below.

Only assignments for ECM factoring of Fermat numbers are distributed by Primenet, so work assignments cannot be obtained for Fermat numbers using the primenet.py script. However p–1 results may be submitted to mersenne.org as Manual results.

Primality testing: Pépin’s test

For the Pépin test, the worktodo format is simply:

Fermat,Test=<exponent>

Fermat numbers in the range [14,22] may select a non-optimal FFT length by default in production mode. If this is the case, the FFT may be overridden using the command:

./Mlucas -fft FFT_LENGTH -shift 0

The optimal FFT lengths vary from 4K up to 512M as shown in the table below.

Currently, all Fermat numbers up to F30 have received a Pépin test, and moreover F31 and F32 are known to be composite. However, the character of their cofactors is unknown; and as for F33, this is yet to be tested for primality.

The Pépin test uses Gerbicz error checking (GEC) to assure the reliability of the computation, however residue shifting is not implemented for Fermat modular squaring with GEC. Thus when invoking Mlucas the flag -shift 0 must be added to the Mlucas command line:

./Mlucas -shift 0

If non-zero residue shifting is used, the Pépin test will not be able to progress beyond the millionth iteration (the default interval for GEC) as error checking will be unable to confirm whether or not the computation is correct.

Cofactor compositeness testing: Suyama’s test

Suyama’s test is a Fermat probable primality test on the cofactor of a Fermat number. As a prerequisite, you must already have run a Pépin test, which should have saved a final residue as a file named e.g. f23 for a Pépin test of F23. Do not try to run a Suyama test as a single work type incorporating the preceding Pépin test.

The worktodo format for the Suyama test is:

PRP=N/A,1,2,<2^m>,+1,99,0,3,5,"known_factor_1[,known_factor_2,...]"

Note that this work format uses the numeric value of 2m of a Fermat number Fm, rather than just the exponent m. At least one known prime factor must be supplied; composite factors are disallowed.

Prior to testing, Mlucas tries a “sanity check” on each known factor, which has the effect of testing whether the Pépin residue R has been correctly calculated. This calculation can be performed at any point up to the final iteration. More information is available here.

Pollard p–1 factoring

Work entries for p–1 factoring have two variations, however only the Pminus1 format is supported for Fermat numbers:

Pminus1=N/A,1,2,<2^m>,+1,B1,B2[,trial_factoring_bits][,B2_start][,"known_factors"]

The same note as regards the exponent for the Suyama test applies here also; supplying known factors is optional. The -shift 0 flag is again a necessary addition to the Mlucas command line.

The testable Fermat numbers Fm = 22m + 1

The following table lists the default FFT lengths selected by Mlucas for the Fermat numbers in the range [14,33] and the known factors, required for some of the test types above.

m 2m Default FFT Optimal FFTs known_factor(s)
14 16384 1K 4K "116928085873074369829035993834596371340386703423373313"
15 32768 2K 4K "1214251009,2327042503868417,168768817029516972383024127016961"
16 65536 3K* 4K "825753601,188981757975021318420037633"
17 131072 6K*, 7K 7K, 8K "31065037602817,7751061099802522589358967058392886922693580423169"
18 262144 13K* 14K, 15K, 16K "13631489,81274690703860512587777"
19 524288 26K* 28K, 30K, 32K "70525124609,646730219521,37590055514133754286524446080499713"
20 1048576 52K* 56K, 60K, 64K
21 2097152 104K* 112K, 120K, 128K "4485296422913"
22 4194304 208K* 224K, 240K, 256K "64658705994591851009055774868504577"
23 8388608 448K 448K, 480K, 512K "167772161"
24 16777216 896K 896K, 960K, 1008K, 1M
25 33554432 1792K 1792K, 1920K, 2M "25991531462657,204393464266227713,2170072644496392193"
26 67108864 3584K 3584K, 3840K, 4032K, 4M "76861124116481"
27 134217728 7M, 7.5M 7M, 7.5M, 7.875M, 8M "151413703311361,231292694251438081"
28 268435456 15M 14M, 15M, 15.75M, 16M "1766730974551267606529"
29 536870912 30M 30M, 31.5M, 32M "2405286912458753"
30 1073741824 60M 60M, 63M, 64M "640126220763137,1095981164658689"
31 2147483648 120M 120M, 126M, 128M "46931635677864055013377"
32 4294967296 256M 240M, 252M, 256M "25409026523137"
33 8589934592 512M 504M, 512M

* These default FFTs selected by Mlucas have radices that cannot be used for Fermat modular squaring; thus F16 to F22 cannot be tested with Mlucas without overriding the default FFT selected, e.g.:

./Mlucas -fft 224K -shift 0

As mentioned under “Mlucas build notes” above, building Mlucas is highly architecture-specific, and some build types will not support all of the optimal FFT lengths. Finally, 4K appears to be the smallest usable FFT for the three smallest testable Fermat numbers.

xanthe-cat commented 8 months ago

This documentation is close to being ready, but I suspect more of the content will need to be refined as ongoing testing is still turning up a number of oddities.

As a result of this testing I can foresee a slew of issues to raise, and I am wondering whether you would like them to be raised as separate issues:

  1. in production mode the FFT selection code for Fermat numbers 16 and [18,22] chooses unworkable lengths because of the limited radices supporting Fermats, which may also affect Mersennes of similarly small size (I can see reasons pro and contra); — issue #3 raised; the workaround seems to be successful for these Fermats. ETA: the documentation describes the workaround needed.
  2. in production mode this default FFT then cannot be over-ridden in these cases (why?); solved; mostly, except for F14 which uses a ridiculous FFT of 7K on AVX2 (this is also the case for F15 and F16 but 7K is apparently workable). On SSE2 Mlucas can use 4K for these tiny exponents. ETA: the problem of the 7K FFT length is due to the fermat.cfg having a poor timing statistic for 4K relative to 7K. One gross workaround would be to simply change the data in fermat.cfg so that it is monotonically increasing, as described in the Mlucas README. Another issue, e.g. of the tiny 4K radix producing the CY routines with radix < 16 do not support shifted residues! error, is similarly worked around by editing the fermat.cfg file to use the preferred radix set (16 8 16).
  3. again in production mode, the Pépin test for the cases of 17 and 23 doesn’t save the final residue at 2m–1 iterations to the fm and qm save files, so the most recent checkpoint is unusable for further testing; — issue #4 raised; none of the completed Pépin tests for [15,20] and 23 saved. ETA: this is now fixed.
  4. the various radix files have incorrect/misleading ‘7,8,9,15 and their multiples!’ errors about radices supporting Fermat modular squaring (the correct values are evidently 7,8,15,63);
  5. Suyama test appears as though it cannot be carried out correctly, as a result of 3. — this will probably be trivial if issue #4 is solved; though it may be a separate bug relating to restarting an interrupted PRP test, so I’m waiting to see the outcome of solving one problem before raising this as well. ETA: there was one other bug preventing the Suyama test from correctly being carried out (the PRP base issue #5) which is now resolved.

There’s also a variety of errors that crop up during the running of config-fermat.sh which may be actual code errors versus numeric incompatibility of some FFT sizes with the number being tested.

tdulcet commented 8 months ago

In general, if you are able to, please do create a separate issue for each bug you find. That would be a huge help. Hopefully we will get volunteers to help debug and fix them all.

However, I believe I was able to address issues 2-3 and 5 (?) today on the forum. Issue 4 should be a trivial fix of just updating the message in those assert statements (assuming your findings are correct). That just leaves issue 1 as the remaining bug to be fixed. More testing would be needed to see if it affects Mersenne numbers. It may help to review the code and comments in get_fft_radices.c to determine how exactly the radices are currently selected.

xanthe-cat commented 8 months ago

I think this documentation is finally at the point where it can be added to the repository as docs/Fermat-testing.md, since subsequent updates should be very few in number.

Having dealt with most of the glaring bug-fixes, a few still remain and others possibly lurk undiscovered. Of the known issues: 1K and 2K FFTs generate a peculiar error in br.c about bit reversal and the size of the radix set being incorrect (when it doesn’t seem to be wrong, by the standards of other radix sets, at least to me).

ERROR: at line 257 of file ../src/br.c
Assertion failed: Exiting.
ERROR: product of radices [2] != vector length [1] in BIT_REVERSE_INT

Radix 8 causes problems with residue-shifted tests (and so occasionally do 7, 14, and 15), however Fermat testing shouldn’t be using non-zero residue shifts, as this effectively disables Gerbicz error checking.

CY routines with radix < 16 do not support shifted residues!

Amending the spuriously misleading ‘7,8,9,15 and their multiples!’ each of the 18 times it occurs in src is low priority.

ERROR: at line 256 of file ../src/radix12_ditN_cy_dif1.c
Assertion failed: radix12_ditN_cy_dif1: Fermat-mod only available for radices 7,8,9,15 and their multiples!

Other warnings are more likely to be actual problems of the computation being unworkable at the selected FFT.

fermat_mod_square: Warning: 26072 of 61440 out-of-range output digits detected.
100 iterations of F20 with FFT length 61440 = 60 K, final residue shift count = 465037

For instance despite all best intentions sometimes a computation will exceed the safe round-off error (ROE).

F24 Roundoff warning on iteration        6, maxerr =   0.499902486801

The “sanity check” of mod-squaring modulo prime factors, prior to running the cofactor test, only outputs to stdout, when it perhaps ought to be written to f_m.stat, since inequality is a definitive check that the Pépin residue is incorrect.

Computing 16383-squaring residue R (mod known prime q = 116928085873074369829035993834596371340386703423373313)
    A: R == 52412083880604209639887539420692164580669760318107863 (mod q)
    B: R == 52412083880604209639887539420692164580669760318107863 (mod q)
tdulcet commented 8 months ago

Having dealt with most of the glaring bug-fixes, a few still remain and others possibly lurk undiscovered.

Thanks again for testing everything and documenting all these remaining bugs. Free free to open issues for any of them you feel should be prioritized. It likely would require someone knowledgeable about Ernst's FFT and assembly code to fix them.

only outputs to stdout, when it perhaps ought to be written to f_m.stat, since inequality is a definitive check that the Pépin residue is incorrect.

There are several cases where it might be helpful to also write information to the *.stat file. As a workaround, one could use the tee command to save all the output to a file: ./Mlucas ... 2>&1 | tee -a Mlucas.out. Alternatively, if one does not want to display the output, they could use the nohup command as described in Ernst's README: nohup ./Mlucas ... >Mlucas.out &.

tdulcet commented 8 months ago

Let me know if you would like to create a PR with this documentation or if you want me to just commit it directly to the repository. We could also add a link to the README for it. Alternatively, you could add it to the Wiki, which would allow you to directly edit it at any time, but that may not be as discoverable for users.

BTW, in case you did not already notice, I added your requested feature to my GIMPS status script a couple of days ago.

xanthe-cat commented 7 months ago

Hi Teal, as you may have seen there are a few more issues in the thread over at the forum (thanks for raising #13!) and I will add some of these shortly. I have a few further tweaks to make the .md above, and then once these are done I think you should be fine committing it to the repository. The last edit was two or three weeks ago so I will not touch it until I’m satisfied with all of the changes.

tdulcet commented 7 months ago

Yes, thanks in advance for doing that. As time permits, I have been trying to create issues for some of the critical show stopping bugs users report, but I know there are a lot more important bugs (as you have noted) that still need issues and further debugging.

OK, sounds like a plan. Please feel free to take as much time as you need on your documentation.

xanthe-cat commented 7 months ago

Updated draft is now in the branch looking at the modular division bug #23, to be added to the main branch whenever that is resolved.