Open leto opened 11 years ago
The zerothbase code says that we can get away with a hash + a single Miller-Rabin test for 32 bit numbers.
Similarly, a hash plus 4 Miller-Rabin tests suffice for 64 bit numbers.
When this is implemented, is_prime() should use the first method for 32 bit numbers, the second for numbers that are larger than 32 bit but fit in 64 bits and BPSW (or AKS, when that fully works) for larger numbers.
I remember reading this thread. It's still worth determining how fast the M-R and Lucas tests are. The best results I've seen are:
32-bit: 3 M-R bases (2,7,61) or (2,1005905886,1340600841) 64-bit 7 M-R bases
The last is a question vs. BPSW. So far it looks like people have been trying to improve the 1 and 2 bases ranges, but at some point I imagine someone will try improving the 5 base case, and perhaps find 6 bases that cover the 64-bit range.
The other thing to consider is usually these are tiered tests, so 1 base for n < 272161, 2 bases for n < 466758181, 3 bases for n < 105936894253, etc. Also, n M-R tests don't take n times as long for random inputs -- only for primes.
I suppose another question is which platform: XS, GMP, or pure Perl? I just set up a benchmark to test Math::Prime::Util for various combinations of MR and BPSW with all three. Some numbers for three selected sizes, taking 5000 random odd numbers and asking if they're prime. Perl 5.17.8 on a x86_64 E7500 Ubuntu machine. Some caveats with the benchmarking are there are multiple paths and I've tried to take the shortest ones, but any extra sub layers do have an effect. Also as these are random odd integers, not random primes, it will favor anything that has early exits (e.g. multiple M-R tests).
I just added zerothbase's test. It uses my existing M-R code, but always does 4 M-R tests (I also added tests for divisibility by 3/5/7 so it matches the existing pre-tests). This of course is the wrong solution for 32-bit where it should do a single test.
9 digits (near top of 32-bit range): 457/s XS M-R 396/s XS zerothbase 104/s GMP 1 M-R test 86/s GMP 3 M-R tests 78/s GMP is_prob_prime (trial div to 400 then BPSW) 65/s GMP 7 M-R tests 57/s GMP BPSW 15/s PP 1 M-R test 13/s PP 3 M-R tests 10/s PP 7 M-R tests 9/s Math::Primality 1 M-R test 7/s Math::Primality 3 M-R tests 6/s Math::Primality BPSW 0.3/s pure perl BPSW
Clearly the PP Lucas test I've implemented is very slow (it uses Math::BigInt for portability, and should be using the GMP backend). Indeed, Math::Primality's BPSW is far faster than my Perl code, but it's a lot slower than pure GMP or C.
15 digits (middle of 64-bit range): 153/s XS M-R 168/s XS zerothbase 74/s GMP 1 M-R test 66/s GMP 3 M-R tests 72/s GMP is_prob_prime (trial div to 400 then BPSW) 55/s GMP 7 M-R tests 47/s GMP BPSW 9/s Math::Primality 1 M-R test 7/s Math::Primality 1 M-R test 4/s Math::Primality BPSW 0.16/s PP 1 M-R test 0.14/s PP 3 M-R tests 0.11/s PP 7 M-R tests 0.10/s pure perl BPSW
The trial + BPSW looks a little better, and the Pure Perl Miller-Rabin tests slowed down a lot. That's because I optimized the < half-word case, and now they have to do mulmods and powmods the hard way.
19 digits (near top of 64-bit range): 114/s XS M-R 140/s XS zerothbase 50/s GMP 1 M-R test 45/s GMP 3 M-R tests 64/s GMP is_prob_prime (trial div to 400 then BPSW) 38/s GMP 7 M-R tests 35/s GMP BPSW 3.6/s Math::Primality 1 M-R test 3.2/s Math::Primality 3 M-R test 2.5/s Math::Primality BPSW 0.098/s PP 1 M-R test 0.089/s PP 3 M-R tests 0.075/s PP 7 M-R tests 0.074/s pure perl BPSW
Following up, since I've made a lot of changes in my primality testing code. Using non-GMP C code, but with x86_64 asm mulmod, I find both 32-bit and 64-bit numbers to get times like (these are times for next_prime(2**62), at 2^31 just divide each time by ~3.5):
1.8 uS 1 M-R base 3.6 uS 2 M-R base 5.3 uS 3 M-R base 4.8 uS BPSW
So BPSW is slightly faster than 3 M-R tests. I have a special case SPRP2 function, but that probably saves only a few nanoseconds. The big savings is from using the "almost extra strong" Lucas test, inspired by Pari. It's an extra strong test without the U=0 case, so the Lucas sequence is a lot faster to calculate (it's just the V portion, so 2 mulmods per bit vs. 3-7 for the old-school Lucas test). Tested against Feitsma's database with no false results, which is really all we need to make sure it works correctly for 64-bit input.
The data here: http://sti15.com/nt/pseudoprimes.html shows that it has fewer pseudoprimes than the strong test. I did some tests and it retains the same residue class properties vs. SPSP2 (I should probably do more formal testing and publish something).
With all C+GMP code, things aren't as clear, as GMP's powm is very fast, making M-R tests pretty cheap. GMP also offers the ability to cheaply get the U value for the Lucas test, so I use the full extra strong test. I get times of about 1 extra-strong BPSW = 6 M-R tests, and 1 strong BPSW = 8 M-R tests.
If I were using GMP for 64-bit numbers and cared about the time, I'd still look into the almost-extra-strong BPSW as it does offer a little savings at this size. Since my BPSW is in the 5-6 M-R base performance range, the hash idea may still pay off. In my case it doesn't because it is faster yet to just use straight C code for 64-bit, and once past 64-bit we don't have deterministic M-R bases.
Thanks for this very valuable knowledge! It will be useful when implementing it in :camel:. "Almost Extra Strong" is a hilarious name.
I would be interested in helping you do "formal testing" and getting something in a publishable state, as well. What kind of help do you need with that?
I thought about various adjectives like "U-less", "U-free", "no-U", "V-only", "partial", etc. I like almost, especially because of the cognitive dissonance in "almost extra", and the results do look like almost the extra strong test.
Most of the information is on that web page. I've got a Linode working on the LSPs and my old i7 server doing some AESLSPs, so in a week or so I'll have more data. See, for example, "Pseudoprime Statistics to 10^19" by Andersen and Dubner. The main difference is that I'd like to look at the various Lucas pseudoprimes both by themselves and in combination with the base-2 pseudoprimes. What I'm thinking I should have:
There are various implementations given and described here:
http://www.mersenneforum.org/showthread.php?p=182647
Some of these implementations are faster than BPSW for numbers that fit in 32 or 64 bits.