Optimized π(x) ( prime_count ) algorithm

leto commented 12 years ago

Implement the algorithm from http://www.ams.org/mcom/1996-65-213/S0025-5718-96-00674-6/S0025-5718-96-00674-6.pdf

danaj commented 11 years ago

A first step may be Lehmer's algorithm (1959). "Prime Numbers and Computer Methods for Factorization" by Hans Riesel has a very simple implementation that works. I made a more complicated one here (this is a snapshot of lehmer.c from Math::Prime::Util):

https://gist.github.com/4385784

It's hard to overstate just how much faster this is than sieving when you get into the 1e12 or larger range. Weeks or months of sieving + counting can be finished in minutes, albeit with a non-trivial amount of memory. The graph below is log scale, and the sieve times are using primesieve running on 12 cores, which is the fastest code I'm aware of (it's faster than yafu, primegen, or my serial sieve). Push it back 1 exponent for serial code, and another 1 exponent for non-optimized-assembler code.

prime_counts

I think this method works well up to ~1018, and it's not hard to implement a basic version. It's an interesting challenge to optimize and reduce memory. Math::Prime::Util also has a pure Perl version. For prime_count(109) the pure Perl version is about 100x faster than sieving in Perl, and over 25,000 times faster than Math::Primality's current prime_count.

What surprises me is that I'm not finding open source projects using these methods at all. Python's tools all use sieving. Sage ended up using a hybrid table + sieving method last time I looked. Most of the discussion I've seen ends up choosing that (a giant table of prime counts at given points, then a sieve+count from the nearest table entry to the target). Thomas Nicely does have some code using Lehmer's method on his page, which is the only other working code I've seen (it's quite different than mine).

I'm very interested in hearing about any progress using any of the improvements: either Lagerias/Miller/Odlyzko 1985 or the Deleglise/Rivat modifications of 1996 in the link. I think a primary benefit of their method would be the memory reduction, as computing 1018 takes ~20GB of memory in my implementation, while their changes should make it take much less. Apparently Oliveira e Silva used their method to computer prime_count(1023), which is unrealistic for my code.

From reading the referenced paper, it looks to me like the changes are all in the phi computation, not in the rest of the Lehmer code. That helps a lot, as one can use a simple Riesel textbook code for the Lehmer function (or start with Legendre which is basically one line), and add optimizations to it as desired. The phi calculation is one thing I spent a lot of time optimizing since it is a real limiting factor.

I've also done all my code using native integers. Since non-ranged prime counts using sieving is practical only to 48-bits or so, this has been OK so far. Classic Lehmer's method with some optimizations to phi works to 60 or so bits, so getting close to the limit. Much past that and one may need to start using GMP which would have performance effects.

The modern methods all seem based on the integral methods starting with Lagarias/Odlyzko 1987. See http://arxiv.org/abs/1203.5712 for example. That would be a big change.

leto commented 11 years ago

Thanks for the pretty graphs and references!

25,000 times faster, ouch! But Math::Primality has always been optimized for "big" numbers, so that is not quite fair :) I agree though, the current prime_count in Math::Primality is very close to the "simplest possible thing that could work". That happens to be very similar to the "slowest thing that could work".

danaj commented 11 years ago

I agree it's not fair at all for is_prime, next_prime, etc. and yet for prime_count it sort of is. Here is a change that, on my machine, runs about 4x faster:

  my $primes = 0;
  do { $primes++ if $n >= $_ } for (2,3,5,7,11,13,17,19,23,29);
  for (my $i = GMP->new(31); Rmpz_cmp($i, $n) <= 0; Rmpz_add_ui($i, $i, 2)) {
    next unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $i, 3234846615);
    $primes++ if is_prime($i);
  }
  return $primes;

with a caveat that this needs Math::GMPz version 0.34 for the gcd. A couple tweaks would make it work with older versions. It also doesn't hurt to change is_prime's 2nd line to

  $n = GMP->new("$n") unless ref($n) eq 'Math::GMPz';

to keep us from constantly making new objects.

A more fair comparison might be to Math::Prime::Util::GMP's prime_count, which takes a range and uses a similar trial method. It loops calling next_prime from low-1 until the result is larger than high. That next_prime implementation calls is_prob_prime while using a mod-30 wheel (cuts out almost half of the calls vs. checking odds). With the changes above, they're within 10x performance according to the little benchmark I just wrote. Both of them are a lot slower than a sieve or Lehmer's method. However, the ranged version can be used like:

   perl -MMath::Prime::Util=:all -E 'use bigint; say prime_count(10**1000, 10**1000+10000);'
   6

which is not something really feasible with other methods.

leto commented 11 years ago

How was the number 3234846615 chosen?

danaj commented 11 years ago

$ factor.pl 3234846615 3234846615: 3 5 7 11 13 17 19 23 29

Just the first few primes. I think if is_prime has the gcd then it's not as necessary here. The primes++ loop in front takes care of the small numbers so we don't have to worry about checking, for example, gcd(19, 3234846615) == 19. We skip forward by 2, but do a cheap test for numbers with small divisors.

leto / math--primality

Optimized π(x) ( prime_count ) algorithm #2