Implement a hashed basetable

leto commented 11 years ago

There are various implementations given and described here:

http://www.mersenneforum.org/showthread.php?p=182647

Some of these implementations are faster than BPSW for numbers that fit in 32 or 64 bits.

leto commented 11 years ago

The zerothbase code says that we can get away with a hash + a single Miller-Rabin test for 32 bit numbers.

Similarly, a hash plus 4 Miller-Rabin tests suffice for 64 bit numbers.

When this is implemented, is_prime() should use the first method for 32 bit numbers, the second for numbers that are larger than 32 bit but fit in 64 bits and BPSW (or AKS, when that fully works) for larger numbers.

danaj commented 11 years ago

I remember reading this thread. It's still worth determining how fast the M-R and Lucas tests are. The best results I've seen are:

32-bit: 3 M-R bases (2,7,61) or (2,1005905886,1340600841) 64-bit 7 M-R bases

The last is a question vs. BPSW. So far it looks like people have been trying to improve the 1 and 2 bases ranges, but at some point I imagine someone will try improving the 5 base case, and perhaps find 6 bases that cover the 64-bit range.

The other thing to consider is usually these are tiered tests, so 1 base for n < 272161, 2 bases for n < 466758181, 3 bases for n < 105936894253, etc. Also, n M-R tests don't take n times as long for random inputs -- only for primes.

I suppose another question is which platform: XS, GMP, or pure Perl? I just set up a benchmark to test Math::Prime::Util for various combinations of MR and BPSW with all three. Some numbers for three selected sizes, taking 5000 random odd numbers and asking if they're prime. Perl 5.17.8 on a x86_64 E7500 Ubuntu machine. Some caveats with the benchmarking are there are multiple paths and I've tried to take the shortest ones, but any extra sub layers do have an effect. Also as these are random odd integers, not random primes, it will favor anything that has early exits (e.g. multiple M-R tests).

I just added zerothbase's test. It uses my existing M-R code, but always does 4 M-R tests (I also added tests for divisibility by 3/5/7 so it matches the existing pre-tests). This of course is the wrong solution for 32-bit where it should do a single test.

9 digits (near top of 32-bit range): 457/s XS M-R 396/s XS zerothbase 104/s GMP 1 M-R test 86/s GMP 3 M-R tests 78/s GMP is_prob_prime (trial div to 400 then BPSW) 65/s GMP 7 M-R tests 57/s GMP BPSW 15/s PP 1 M-R test 13/s PP 3 M-R tests 10/s PP 7 M-R tests 9/s Math::Primality 1 M-R test 7/s Math::Primality 3 M-R tests 6/s Math::Primality BPSW 0.3/s pure perl BPSW

Clearly the PP Lucas test I've implemented is very slow (it uses Math::BigInt for portability, and should be using the GMP backend). Indeed, Math::Primality's BPSW is far faster than my Perl code, but it's a lot slower than pure GMP or C.

15 digits (middle of 64-bit range): 153/s XS M-R 168/s XS zerothbase 74/s GMP 1 M-R test 66/s GMP 3 M-R tests 72/s GMP is_prob_prime (trial div to 400 then BPSW) 55/s GMP 7 M-R tests 47/s GMP BPSW 9/s Math::Primality 1 M-R test 7/s Math::Primality 1 M-R test 4/s Math::Primality BPSW 0.16/s PP 1 M-R test 0.14/s PP 3 M-R tests 0.11/s PP 7 M-R tests 0.10/s pure perl BPSW

The trial + BPSW looks a little better, and the Pure Perl Miller-Rabin tests slowed down a lot. That's because I optimized the < half-word case, and now they have to do mulmods and powmods the hard way.

19 digits (near top of 64-bit range): 114/s XS M-R 140/s XS zerothbase 50/s GMP 1 M-R test 45/s GMP 3 M-R tests 64/s GMP is_prob_prime (trial div to 400 then BPSW) 38/s GMP 7 M-R tests 35/s GMP BPSW 3.6/s Math::Primality 1 M-R test 3.2/s Math::Primality 3 M-R test 2.5/s Math::Primality BPSW 0.098/s PP 1 M-R test 0.089/s PP 3 M-R tests 0.075/s PP 7 M-R tests 0.074/s pure perl BPSW

This is on a 64-bit machine, which makes all modulo operations on 32-bit numbers a lot easier. The performance on a 32-bit machine would look a lot different for my XS and PP numbers, while the GMP and Math::Primality numbers would likely show little difference.
Using GMP is a lot slower than the native C code, though a huge caveat that this was on an x86_64 machine so it has a fast asm mulmod method.
In straight GMP, the savings in 32-bit could be there in GMP. A few trial divisions (i.e. a GCD), a hash, and a single M-R test.
It looks like there is some speedup for 64-bit, especially for numbers >= 3,071,837,692,357,849 where the current best known alternative is 7 M-R tests (or BPSW). A couple GCDs in front would probably make it a lot easier to find a 3-base solution and/or a smaller table.
I think it's a great idea to test, but if you're really concerned about performance it looks like you need to be going to C or GMP.
Verifying correctness would seem a lot harder.

danaj commented 11 years ago

Following up, since I've made a lot of changes in my primality testing code. Using non-GMP C code, but with x86_64 asm mulmod, I find both 32-bit and 64-bit numbers to get times like (these are times for next_prime(2**62), at 2^31 just divide each time by ~3.5):

1.8 uS 1 M-R base 3.6 uS 2 M-R base 5.3 uS 3 M-R base 4.8 uS BPSW

So BPSW is slightly faster than 3 M-R tests. I have a special case SPRP2 function, but that probably saves only a few nanoseconds. The big savings is from using the "almost extra strong" Lucas test, inspired by Pari. It's an extra strong test without the U=0 case, so the Lucas sequence is a lot faster to calculate (it's just the V portion, so 2 mulmods per bit vs. 3-7 for the old-school Lucas test). Tested against Feitsma's database with no false results, which is really all we need to make sure it works correctly for 64-bit input.

The data here: http://sti15.com/nt/pseudoprimes.html shows that it has fewer pseudoprimes than the strong test. I did some tests and it retains the same residue class properties vs. SPSP2 (I should probably do more formal testing and publish something).

With all C+GMP code, things aren't as clear, as GMP's powm is very fast, making M-R tests pretty cheap. GMP also offers the ability to cheaply get the U value for the Lucas test, so I use the full extra strong test. I get times of about 1 extra-strong BPSW = 6 M-R tests, and 1 strong BPSW = 8 M-R tests.

If I were using GMP for 64-bit numbers and cared about the time, I'd still look into the almost-extra-strong BPSW as it does offer a little savings at this size. Since my BPSW is in the 5-6 M-R base performance range, the hash idea may still pay off. In my case it doesn't because it is faster yet to just use straight C code for 64-bit, and once past 64-bit we don't have deterministic M-R bases.

leto commented 11 years ago

Thanks for this very valuable knowledge! It will be useful when implementing it in :camel:. "Almost Extra Strong" is a hilarious name.

I would be interested in helping you do "formal testing" and getting something in a publishable state, as well. What kind of help do you need with that?

danaj commented 11 years ago

I thought about various adjectives like "U-less", "U-free", "no-U", "V-only", "partial", etc. I like almost, especially because of the cognitive dissonance in "almost extra", and the results do look like almost the extra strong test.

Most of the information is on that web page. I've got a Linode working on the LSPs and my old i7 server doing some AESLSPs, so in a week or so I'll have more data. See, for example, "Pseudoprime Statistics to 10^19" by Andersen and Dubner. The main difference is that I'd like to look at the various Lucas pseudoprimes both by themselves and in combination with the base-2 pseudoprimes. What I'm thinking I should have:

[ ] Some BPSW history. I made a post on mersenneforum.org about some of the things I found, especially all the confusion around Chen and Greene's changing terminology (they were interested in the $620 challenge, convinced Pomerance to loosen the requirements to win it, and then out of the blue call this a Baillie-PSW pseudoprime test, when the earlier literature is clear that it is no such thing).
[ ] Description of some of the primality tests that call themselves BPSW. The Wikipedia talk page has quite a list
[ ] Residue class info. This is an important one in justifying the AES and ES tests that I haven't seen done. It should be relatively straightforward, as Baillie and Wagstaff already have a number of pages devoted to it in their paper, so I'd be extending their results from 10^8 to 10^13 or farther.
[ ] Performance info for the various tests.
[ ] Explicit probabilities. This is murky, requires more thinking, is prone to fundamental error (as most things are with probabilities), and of questionable value. But should be easy to get results :) The idea is that we can have a table for each bit range (e.g. all 40-bit numbers) showing the probability of a non-prime passing the given test. It would be worthwhile to look at combined probabilities, both blind and using residue classes.

leto / math--primality

Implement a hashed basetable #8