sympy / sympy

A computer algebra system written in pure Python
https://sympy.org/
Other
13.05k stars 4.46k forks source link

isprime can be faster #6908

Closed asmeurer closed 4 years ago

asmeurer commented 11 years ago
See http://www.johndcook.com/blog/2013/01/17/narcissus-prime-in-python/ . The code isprime(int("1808010808"*1560 + "1")) should return within a couple of minutes.

Original issue for #6908: http://code.google.com/p/sympy/issues/detail?id=3809 Original author: https://code.google.com/u/asmeurer@gmail.com/

vramana commented 11 years ago
Can implementing AKS primality test will be of any help in this issue??

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c1 Original author: https://code.google.com/u/105451478468557210330/

prasoon2211 commented 11 years ago
Well, we are using the Rabin-Miller Strong Pseudoprime Test ( http://mathworld.wolfram.com/Rabin-MillerStrongPseudoprimeTest.html ) right now which is probabilistic in nature.

The AKS primality testing algorithm is a deterministic algorithm. If we want to go with a deterministic algorithm anyway, we should in fact use Lenstra and Pomerance ( http://www.math.dartmouth.edu/~carlp/aks041411.pdf ) which is an improvement over the AKS algorithm.

Mathematica seems to use another probabilistic algorithm as well though ( http://mathworld.wolfram.com/LucasPseudoprime.html ) so perhaps we should implement that as well.

In fact, I believe that we should come up with a mix to use these algorithms for appropriate cases. There should be a cross-over point where the deterministic version works better compared to the probabilistic one. So we can use that to use either a one or the other version depending on the crossover point.

Perhaps someone with more knowledge in number theory can give more insights?

> The code isprime(int("1808010808"*1560 + "1")) should return within a couple of minutes.

It seems it does on PyPy (<18  minutes according to one of the comments).

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c2 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
Scratch the last line.

The user ran "Iterative method for Fermat's Test" which ran in <18 minutes on PyPy so perhaps we should implement a deterministic test after all.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c3 Original author: https://code.google.com/u/102594231986105975028/

asmeurer commented 11 years ago
It's also possible that we are doing something inefficiently in Miiller Rabin.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c4 Original author: https://code.google.com/u/asmeurer@gmail.com/

prasoon2211 commented 11 years ago
Certainly. In any case, I'll try to look for these inefficiencies today and see what I find.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c5 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
So I looked at the code. The only inefficiency I found was that 2 parameters were being calculated many times in the _test function (line 62 and 63) when they should be calculated only once for each n. But this doesn't seem to be the major reason of slowness. From the looks of it, the slowness occurs because of the loop on lines 69-72.

On the link mentioned in the bug report, there was a mention of using multiple threads for different bases. Should we do that? I haven't seen threading used elsewhere in SymPy but maybe it is. Also, someone should probably check whether the test for the Narcissus prime really does take that long as mentioned on the post.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c6 Original author: https://code.google.com/u/102594231986105975028/

asmeurer commented 11 years ago
I ran it on my computer and it took several hours to finish.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c7 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
If you have time, maybe run it with line_profiler.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c8 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
I wonder if pow(b, 2, n) would be faster.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c9 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
We should add some kind of verbose flag to isprime that gives the status, similar to factorint.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c10 Original author: https://code.google.com/u/asmeurer@gmail.com/

prasoon2211 commented 11 years ago
I'm travelling right now and am away from my workstation. I'll see what can be done once I'm free.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c11 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
My internet is messed up right now and I don't have the python-dev package so I cannot build line profiler right now. But using %time on ipython, I don't think that pow(b, 2, n) makes any difference. I tested for the prime 2**1279 - 1 and got exactly the same results for using pow(b, 2, n) against using (b**2)%n.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c12 Original author: https://code.google.com/u/102594231986105975028/

smichr commented 11 years ago
the mr routine raises the bases to a large power mod n. The larger the power the longer this takes. Since the number in question here has about 16000 digits we won't expect a return in a matter of minutes unless that power(base, t, n) can be faster:

>>> from time import time
>>> p=100
>>> while 1:
...   n=(10**p)
...   t=time();j=pow(2,n,n);print time()-t,p
...   p*=2
...
0.00300002098083 100
0.0210001468658 200
0.156000137329 400
0.84299993515 800
5.82999992371 1600
65.0769999027 3200

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c13 Original author: https://code.google.com/u/117933771799683895267/

smichr commented 11 years ago
btw, the difference between n**2 % n and pow(n,2,n) can be pretty dramatic:

>>> p=1
>>> while 1:
...   n=int((10**p))
...   t=time();j=n**2 % n;print time()-t,p
...   p*=2
...
0.0 1
0.0 2
0.0 4
0.0 8
0.0 16
0.0 32
0.0 64
0.0 128
0.0 256
0.0 512
0.0 1024
0.0019998550415 2048
0.0360000133514 4096
0.0379998683929 8192
0.104000091553 16384
0.398000001907 32768
1.39400005341 65536
4.4509999752 131072
17.4820001125 262144

>>> p=1
>>> while 1:
...   n=int((10**p))
...   t=time();j=pow(n,2,n);print time()-t,p
...   p*=2
...
0.0 1
0.0 2
0.0 4
0.0 8
0.0 16
0.0 32
0.0 64
0.0 128
0.0 256
0.0 512
0.0 1024
0.0 2048
0.0 4096
0.0 8192
0.0 16384
0.000999927520752 32768
0.000999927520752 65536
0.000999927520752 131072
0.0019998550415 262144

That routine should probably be changed to use pow.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c14 Original author: https://code.google.com/u/117933771799683895267/

asmeurer commented 11 years ago
It's not mod n. n**2 % n is always 0.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c15 Original author: https://code.google.com/u/asmeurer@gmail.com/

smichr commented 11 years ago
It's mod n but it's not n**2 it's b**2:  

b = (b**2) % n  --should be--> b = pow(b, 2, n)

...but it's not going to help much when computing b = pow(base, t, n) involves such a large t.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c16 Original author: https://code.google.com/u/117933771799683895267/

asmeurer commented 11 years ago
I think modular exponentiation helps the most with large powers.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c17 Original author: https://code.google.com/u/asmeurer@gmail.com/

smichr commented 11 years ago
See the tests above...a 262k digit base being squared either takes about .002 seconds with pow or 10,000x longer with **

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c18 Original author: https://code.google.com/u/117933771799683895267/

asmeurer commented 11 years ago
Oh of course. It will indeed be much faster with pow because it computes the mod before squaring. 

But is that the source of the slowdown here?

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c19 Original author: https://code.google.com/u/asmeurer@gmail.com/

prasoon2211 commented 11 years ago
Well, I suppose we can just test out with large primes (Mersenne primes maybe) for that. As I wrote before, on my workstation, it took the same time with either way.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c20 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
Btw, I ran a python script that uses the MR algorithm (found here : http://en.literateprograms.org/index.php?title=Special:DownloadCode/Miller-Rabin_primality_test_(Python)&oldid=17104) .

I checked for 2**1279 - 1 and it took half the time it took on SymPy. So, definitely improvements can be made.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c21 Original author: https://code.google.com/u/102594231986105975028/

asmeurer commented 11 years ago
Do primes represent the worst case scenario in this algorithm. Should we also test against composites?

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c22 Original author: https://code.google.com/u/asmeurer@gmail.com/

prasoon2211 commented 11 years ago
Well, AFAIK, for the cases in the _mr_safe function (that is for  if n < 10000000000000000), we can just test for certain primes and know _for sure_ whether n is prime or not. So there using primes as witnesses is justified.

Beyond that, we are using a list of 46 different primes to test for n not composite. The overall probability that n is not prime will then be (1/4)**46 which is good enough. As to your question, no, I do not think that we need to have only primes to test for n > 10000000000000000. From reading the algorithm on the internet, I think we should be able to take any composite numbers as witness to the primality. Though why only primes were used, I don't know.

I think it's quite clear ( http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test#Algorithm_and_running_time ) that we can use composites.

Also, I do not know whether it will make much of a difference to not use primes. We are doing pow(base, t, n) (line 58) where `base` is the number in question. Unless pow can work better for composite bases compared to prime bases, I think that it would be immaterial whether it is prime or composite.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c23 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
And just to be clear, we don't need to test against composites *also*. Primes themselves are good enough.

We are using any numbers between 2 and n-2 as witnesses to n's primality. It doesn't matter whether they are prime or composite. I don't know why in primetest.py, primes are exclusively used.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c24 Original author: https://code.google.com/u/102594231986105975028/

asmeurer commented 11 years ago
You misunderstood my question. I'm not talking about the witnesses. I'm talking about the inputs to isprime. You keep talking about testing it against some large Mersenne prime. But shouldn't we also test it against large composites?

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c25 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
By the way, the b**2 % 2 thing actually doesn't matter. b is already reduced mod n from the previous line, so using pow shouldn't make any difference, especially since the exponent is only 2, so it's not like there are any intermediate exponents that we can reduce.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c26 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
Any one care to compare timings for http://en.literateprograms.org/index.php?title=Special:DownloadCode/Miller-Rabin_primality_test_(Python)&oldid=17104) to SymPy with the narcissus prime?   I'm also curious how PyPy performs.  I suspect it can be faster, though it could depend a lot on how efficient their big number implementation is.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c27 Original author: https://code.google.com/u/asmeurer@gmail.com/

smichr commented 11 years ago
> the b**2 % 2 thing actually doesn't matter. b is already reduced mod n from the previous line

Yeah, but n is huge in the case of the narcissus prime...so it can matter.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c28 Original author: https://code.google.com/u/117933771799683895267/

smichr commented 11 years ago
The code referred to online that is 2X faster is only using 20 bases (vs 46) to make the determination of primality so I think that explains why it is faster.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c29 Original author: https://code.google.com/u/117933771799683895267/

smichr commented 11 years ago
On the basis of comment #13 I think this issue should be closed as invalid. A 3200 digit number takes 65 seconds to square mod n. There is no way the isprime test is going to return any faster on the much longer narcissus prime unless one only tests a single prime.

Nonetheless, some changes to make things a little more efficient are in a new PR.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c30 Original author: https://code.google.com/u/117933771799683895267/

smichr commented 11 years ago
https://github.com/sympy/sympy/pull/2119

**Labels:** NeedsReview smichr  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c31 Original author: https://code.google.com/u/117933771799683895267/

prasoon2211 commented 11 years ago
@asmeurer: Sorry, I misunderstood your question. And I have run the test on the narcissus prime. It's been running for a couple of hours now but I am only using a netbook so it is bound to be slower.

@smichr: Hmm. That looks like the reason for the faster results. Probably test on PyPy with it?

I can only see two inefficiencies till now: The one I mentioned earlier and the other one is using pow(b, 2, n) vs using b**2 % n; The first one isn't promising at all. We should test the second one.

If all fails, I suppose we should switch to some combination of algorithms as we talked about before.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c32 Original author: https://code.google.com/u/102594231986105975028/

prasoon2211 commented 11 years ago
Oh I see that both of those inefficiencies have been fixed in the PR by Chris.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c33 Original author: https://code.google.com/u/102594231986105975028/

smichr commented 11 years ago
that PR has been committed but I'll leave the issue open. Although maybe there is a different algorithm that might be faster, there is no way that I can see to make Miller-Rabin faster other than by testing fewer base, so there is no way (with MR) that the prime given in the OP will return in a few minutes.

**Labels:** -NeedsReview -smichr  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c34 Original author: https://code.google.com/u/117933771799683895267/

smichr commented 11 years ago
See http://www.trnicely.net/misc/bpsw.html for discussion about a test that combines a single Miller-Rabin tests with the Lucas-Lehmer test.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c36 Original author: https://code.google.com/u/117933771799683895267/

danaj commented 11 years ago
I've been working on primality tests and proofs for a Perl module, so this is something I've been looking at a lot recently.  Here are my long-winded thoughts, with the caveat that I'm not a sympy developer.

Comment 1 & 2 mention AKS.  AKS is theoretically important, but is still basically useless in any practical sense.  It is insanely slow, even with Lenstra and Bernstein's improvements to the 'r' selection.  Brent 2010 estimates 840 years for a 308 digit number (on a 1GHz machine, but even with 100x faster hardware that's ridiculous).  For the deterministic variants I think you'll find that time isn't out of line.

Comment 3 mentions the "pypy ran a deterministic test in 18 minutes" comment.  He ran a single Fermat test, which is much less accurate than a single M-R test.  Try 561 through his function. or 1105, 1729, 2465, etc.  So after 18 minutes we've not accomplished very much.  This is definitely not the direction to go.

For small numbers (<2^64), you can use deterministic M-R tests, such as the ones from https://miller-rabin.appspot.com/ .  That would be faster than the existing code and remove the table of exceptions.  BPSW would probably be faster for the upper range.  BPSW (standard, strong, extra strong, or the oddball variants some packages implement) has no counterexamples < 2^64, and this is typically the same speed or faster than running 4 M-R tests.  With my native (C without GMP) implementations I cut over to BPSW if the number is over 32 bit in size -- your mileage may vary (this is purely a benchmark-oriented performance decision at this size).

For large values, I think you really should use a variant of BPSW (see Nicely's page or http://en.wikipedia.org/w/index.php?title=Baillie%E2%80%93PSW_primality_test ).  Here is a number from Arnault 1993 and another from Arnault 1995 that fool sympy's current implementation:

8038374574536394912570796143419421081388376882875581458374889175222974273765333652186502336163960045457915042023603208766569966760987284043965408232928738791850869166857328267761771029389697739470167082304286871099974399765441448453411558724506334092790222752962294149842306881685404326457534018329786111298960644845216191652872597534901

2887148238050771212671429597130393991977609459279722700926516024197432303799152733116328983144639225941977803110929349655578418949441740933805615113979999421542416933972905423711002751042080134966731755152859226962916775325475044445856101949404200039904432116776619949629539250452698719329070373564032273701278453899126120309244841494728976885406024976768122077071687938121709811322297802059565867

Running lots of M-R tests to many fixed bases like sympy's current implementation or libtommath does (and hence Perl 6) is problematic.  Arnault 1995 gives an example (the second above) that will pass MR tests with all bases up to 306.  That paper gives instructions on how to construct such numbers.  Using random bases mostly solves this problem.  However, we're still left with it being terribly inefficient -- running 46 M-R tests is 10-15 times slower than a BPSW test.  If you're feeling paranoid about letting a pseudoprime through, you can add a couple more random-base M-R tests (only Mathematica and Math::Prime::Util seem to go to this length that I'm aware of).

Why BPSW?  Nicely's page should cover lots of the reasons.  Almost all math packages switched to using it, mostly in the 1990s.  Maple, Mathematica (with an added base 3 M-R test), Pari, SAGE, Maxima, FLINT, MPIR, PRIMO are some examples that use a BPSW variant for their probable prime test.  In 33 years nobody has published a concrete counterexample (though we know they exist).  For some reason the tests act anti-correlated.

Python's gmpy2 includes David Cleaver's C+GMP code for primality functions including Lucas tests and a strong BPSW test, but is_prime still uses mpz_probab_prime_p.  I spoke to some of the Perl 6 people last month, and I need to write up some code for NQP as well as start a discussion on the API.

Here are examples and times on my computer.  The Perl tests use Math::Prime::Util + Math::Prime::Util::GMP.  This is C+GMP under the hood, just like the gmpy and gmpy2 modules.  Python 2.7.  Based on the time required for one run of mr with the latest git version (that uses pow(base, t, n)), it looks like this could be done in a reasonable length of time.

# sympy from git, mr(int("1808010808"*1560 + "1"),[2])
12.2s

# sympy from git, isprime(int("1808010808"*1560 + "1"))
9 minutes 16.4s

Hmm, this is a few minutes.  Of course I have the latest code so it's presumably much more efficient than earlier, but I'm concerned about the comment "[...]there is no way (with MR) that the prime given in the OP will return in a few minutes."  Let's look at some other systems.

# Pari/gp 2.6.0 ispseudoprime(....) uses a BPSW variant
60.12s

# gmpy 1.15's is_prime (mpz_probab_prime_p(n,25), or M-R with 25 fixed-random bases)
# Installed via Fedora
8 minutes 29.3s

# gmpy 2.0's is_prime (mpz_probab_prime_p(n,25), or M-R with 25 fixed-random bases)
# Built from sources along with latest MPFR, MPC, and GMP.
5 minutes 12.8s

# gmpy 2.0 is_strong_prp(int("1808010808"*1560 + "1"),2)
12.03s

# gmpy 2.0 is_selfridge_prp(int("1808010808"*1560 + "1"))
64.95s

# gmpy 2.0 is_strong_selfridge_prp(int("1808010808"*1560 + "1"))
64.85s

I wasn't able to get gmpy2's is_strong_bpsw_prp function to work, but it would be about 77s based on the two tests it has to do.

My Perl module, with GMP implementations that are a little faster than the ones in gmpy2:

# Just the SPRP-2 test:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_strong_pseudoprime($n,2);'
12.06s

# Strong Lucas-Selfridge test:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_strong_lucas_pseudoprime($n);'
47.17s

# Extra Strong Lucas test -- typically a bit faster:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_extra_strong_lucas_pseudoprime($n);'
31.84s

# is_prob_prime includes some quick tests for weeding out composites, then SPRP-2 and strong Lucas-Selfridge:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_prob_prime($n);'
59.32s

# is_prime is is_prob_prime plus 2-5 extra M-R tests (2 for this size):
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_prime($n);'
82.97s

So it's possible to do a BPSW variant (SPRP-2 + Grantham Extra Strong test) and have it done in under 44 seconds, assuming modular math including exponentiation is fast.  Add ~12s per extra M-R test if you want.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c37 Original author: https://code.google.com/u/108653038694775415778/

asmeurer commented 11 years ago
Thanks for the write up. I didn't realize that gmpy2 has this functionality. We already support gmpy and gmpy2 as optional dependencies to make the polys faster. We should additionally wrap gmpy2's primality tester if it is installed. 

Where is the source for your Perl module? What is it licensed (though by the time we translate it to Python, it probably wouldn't matter)? Is it pure Perl or does it wrap C?

Are you interested in implementing any of these things in SymPy?

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c38 Original author: https://code.google.com/u/asmeurer@gmail.com/

danaj commented 11 years ago
I can check the gmpy2 functions -- the underlying code should be solid since it was written for a PRIMO validator that has been used a lot in the last two years, but it wouldn't hurt to run a few more tests on the code that's present.  The issue I had with is_strong_bpsw_prp looks like some sort of wrapping issue where the second internal call is getting the wrong number of arguments.

Perl modules, (note links to the github pages for the development versions): https://metacpan.org/release/Math-Prime-Util https://metacpan.org/release/Math-Prime-Util-GMP License is the Perl artistic license (which is either GPL or Artistic License), and a LICENSE file is included.  The GMP module uses a derivative of SIMPQS which is GPL v2+, though that is only used for certain factoring tasks.

There are versions in pure Perl, C, and for many things (including the primality tests) C+GMP in the Math::Prime::Util::GMP module.  The timing above all went to C+GMP for the real work.  The pure Perl code really is two parts:
  1) a lot of wrapping and tasks where performance isn't critical
  2) backup for all other functionality, or for big number use when MPU::GMP isn't installed
The performance of pure Perl for big number operations leaves a lot to be desired, especially with the default bigint backend, but at least it works.

If you'd like, I can write something to start, for instance:
  - use deterministic bases for numbers < 2^64 which should simplify the code and make it a little faster
  - put try / except for HAS_GMPY == 2 and use the tests it has to do a BPSW test.
and tests.

Implementing the strong Lucas test in pure Python should be possible -- it isn't that much code.  I'll give it a shot and we'll see what the performance looks like.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c39 Original author: https://code.google.com/u/108653038694775415778/

asmeurer commented 11 years ago
Yeah, that would be great. I think the number theory module in SymPy could definitely use some love from someone who knows what they're doing.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c40 Original author: https://code.google.com/u/asmeurer@gmail.com/

danaj commented 11 years ago
I have a fork with a complete rewrite.  I need to do more testing before doing a pull request.  I'm not sure what the normal review process is -- do a pull request and go from there, or point to a branch to get initial comments.  This includes things like whether to make some functions private or public (e.g. modular Lucas sequence).

I've been testing on a machine that doesn't have gmpy or gmpy2, so pow() and friends are quite slow.  I'm sure there is more room for optimization.  It does make performance more visible.

The standard, strong, and extra strong Lucas tests are implemented and I've verified that the initial pseudoprimes match the OEIS sequence and other code.

I put some new tests in, including the two Arnault composites.  I saw one of the Arnault composites in the test suite already, but asserted as prime with no comments.  I'm not sure if this was a mistake or someone putting in an expected defect without commenting it as such.  I did not add specific tests for the new functions.

This has deterministic tests for up to 2^64, and some more trivial factor checking.  No more _pseudos table or testing for _mr_safe.  However one of the tests I need to do is verify that we properly determine all the PSP-2's from Feitsma's database.

for i in xrange(5000000):
  if isprime(i)
    sum += i

master:  25.2s
branch:  14.9s

isprime( 10**2000 + 4561 ):
  44.3s   old isprime using 46 bases
   5.3s   strong BPSW plus an extra M-R test with random base
   4.3s   extra strong BPSW plus an extra M-R test with random base
   4.1s   strong BPSW
   3.2s   extra strong BPSW

for i in [2,4,6,8,10]:
  isprime( 10**2000 + 4561 + i )

master:  5.0s
branch:  0.2s

For the Narcissus prime int("1808010808"*1560 + "1"), times are:
   6m 2s   mr(n, 2)
   19m12s  is_strong_lucas_prp(n)
   13m 9s  is_extra_strong_lucas_prp(n)
again noting that gmpy / gmpy2 is not installed, so pow() is quite slow.  If we ran the 46 bases from the old code, we would expect a total isprime time of about 4.5 hours.  If we do a strong BPSW test it is about 25 minutes.  Using the "extra strong" test would be about 18 minutes.  Adding another M-R test would add about 6 minutes.

I'll do more tests including with GMPY2 tonight.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c41 Original author: https://code.google.com/u/108653038694775415778/

casevh commented 11 years ago
I thought I had fixed is_bpsw_prp() and is_strong_bpsw_prp() in gmpy2 2.0.0. :(

I committed a fix to the svn repository ( r813 ). Can you give that a try?

There is a version of the BPSW test included with the MPIR library but not with GMP. I'll compare its performance and report back. If it is significantly faster, I'll try to extract the code so it is available for all builds of gmpy2.

If you have have C code that you would like me to include with gmpy2, I can do that, too.

casevh

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c42 Original author: https://code.google.com/u/casevh/

danaj commented 11 years ago
Case,

Yes, the dev version works (and does the strong test, yay).  Trying to verify with Feistma's database, I found this segfaults:  is_strong_selfridge_prp("2047").  I can obviously wrap in int, but segfault seems harsh.  Regardless:

  is_strong_selfridge_prp indicates composite for all 31.9M spsp2s below 2^64 (as expected and desired).

I looked at MPIR 2.6.0 last month (FLINT uses basically identical code), and it does some really oddball things.   I'm personally dubious about it being actually BPSW.  It does a mix of PSP and SPSP.  The Lucas test it uses has more pseudoprimes than the Lucas-Selfridge tests.  For example number of pseudoprimes under 1e9:  943 extra strong Lucas, 1415 strong Lucas-Selfridge, 5485 Lucas-Selfridge, 8533 MPIR n_is_pseudoprime_lucas.  This doesn't preclude it from being a good test -- what is important is that it be disjoint from the other test -- but it doesn't seem promising given that there doesn't seem to be any reason to do it that way.

All that said, I just ran the Fibonacci/Lucas test against Feitsma's database, and it has no counterexamples with standard or strong base 2 pseudoprimes under 2^64.  Since MPIR 2.6.0 only uses it below 2^64 this is good enough.  For use above 2^64, with 6x the number of pseudoprimes as the strong Lucas-Selfridge test, there should be a compelling reason to use the non-standard method, which I just don't see.

David Cleaver's code could be optimized some, but it looks ok, it's easy to read, and I believe it is used by factordb's Primo validator so gets a lot of use.  I'm not sure it really needs to be replaced unless you need.

You can look at my code here: https://github.com/danaj/Math-Prime-Util-GMP/blob/master/gmp_main.c which has all three Lucas tests.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c43 Original author: https://code.google.com/u/108653038694775415778/

casevh commented 11 years ago
Dana,

I took a quick look before I had to leave for work and I think I spotted the cause of the segmentation fault. I'll try to fix it this evening.

Case

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c44 Original author: https://code.google.com/u/casevh/

asmeurer commented 11 years ago
Do a pull request as soon as you have code. That's the easiest way to even see what you have done, not to mention comment on it.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c45 Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer commented 11 years ago
Apparently I can make a pull request for you. https://github.com/sympy/sympy/pull/2209 .

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c46 Original author: https://code.google.com/u/asmeurer@gmail.com/

casevh commented 11 years ago
Dana,

I've committed another set of fixes. Can you give r814 a try?

Case

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c47 Original author: https://code.google.com/u/casevh/

danaj commented 11 years ago
Case, I tried r814 and it seems to work fine for me, both on the command line and with sympy calling it.  Using quotes now raises a TypeError, as do most of the oddball error cases.

I was going through the prp calls, and am getting what I think are incorrect results from is_extra_strong_lucas_prp.  There are a lot more pseudoprimes being output than there should be.  It looks like the code and comments have misread Grantham 2000 (the "Lucas pseudoprime" page on Wikipedia is also good).

  Grantham/MPU/Nicely:  ( U_s=0 AND V_s = +/- 2 ) OR (V_{s*2^t} = 0 for t in 0..r-1)
  gmpy_mpz_prp.c:         U_s=0 OR  V_s = +/-2    OR  V_{s*2^t} = 0 for t in 0..r-1

It's in mpz_prp.c, so I'll drop David a note.  This should have no impact for GMPY2 outside of this one call.

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c48 Original author: https://code.google.com/u/108653038694775415778/

asmeurer commented 11 years ago
**Labels:** NumberTheory  

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c49 Original author: https://code.google.com/u/asmeurer@gmail.com/

casevh commented 11 years ago
Dana,

I've updated is_extra_strong_lucas(). Can you test r816 ?

Thanks, Case

Original comment: http://code.google.com/p/sympy/issues/detail?id=3809#c50 Original author: https://code.google.com/u/casevh/

Amitjha1412 commented 9 years ago

Is anybody still working on this bug?