Add a benchmark test against Math::Prime::Util

It will be useful to have a benchmark test, (probably in xt/) so we can test the performance change of code against Math::Prime::Util :

http://search.cpan.org/~danaj/Math-Prime-Util-0.11/lib/Math/Prime/Util.pm

I should do a fork and add them, but here's an example for prime_count:

#!/usr/bin/env perl
use strict;
use warnings;
use Math::Prime::Util;
use Math::Prime::Util::GMP;
use Math::Primality;
use Benchmark qw/:all/;
my $count = shift || -2;

#my($n, $exp) = (100000,9592);
my($n, $exp) = (1000000,78498);
#my($n, $exp) = (10000000,664579);
cmpthese($count,{
  'MP'           =>sub { die unless $exp == Math::Primality::prime_count($n); },
  'MPU default'  =>sub { die unless $exp == Math::Prime::Util::prime_count($n); },
  'MPU XS Sieve' =>sub { die unless $exp == Math::Prime::Util::_XS_prime_count($n); },
  'MPU XS Lehmer'=>sub { die unless $exp == Math::Prime::Util::_XS_lehmer_pi($n); },
  'MPU PP Sieve' =>sub { die unless $exp == Math::Prime::Util::PP::_sieve_prime_count($n); },
  'MPU PP Lehmer'=>sub { die unless $exp == Math::Prime::Util::PP::_lehmer_pi($n); },
  'MPU GMP Trial'=>sub { die unless $exp == Math::Prime::Util::GMP::prime_count(2,$n); },
});

I decided to add the individual methods (ignoring Meissel and Legendre which are never called by the main system), which are done using private methods. One could also examine different input sizes. Here are the results on my machine, after the change to Math::Primality's prime_count discussed on the prime_count issue.

perl bench-mp-primecount.pl -10
            (warning: too few iterations for a reliable count)
                 Rate       MP MPU GMP Trial MPU PP Sieve MPU PP Lehmer MPU default MPU XS Sieve MPU XS Lehmer
MP            0.163/s       --          -84%         -99%         -100%       -100%        -100%         -100%
MPU GMP Trial  1.02/s     525%            --         -94%         -100%       -100%        -100%         -100%
MPU PP Sieve   15.7/s    9545%         1443%           --          -96%       -100%        -100%         -100%
MPU PP Lehmer   402/s  246321%        39332%        2455%            --        -92%         -92%          -97%
MPU default    5057/s 3099945%       495962%       32043%         1158%          --          -2%          -65%
MPU XS Sieve   5156/s 3160765%       505695%       32673%         1183%          2%           --          -65%
MPU XS Lehmer 14642/s 8975178%      1436104%       92959%         3542%        190%         184%            --

The cutoff for XS to use Lehmer is set at 4M, so for this input size it chooses sieving by default. The cutoff for the Perl code is 50,000. It's a little confusing when ranges are included because then we're looking at two Lehmer counts vs. one segmented sieve. Plus there is a memory consideration, as the Lehmer count uses more memory (but usually that's a concern only when using inputs so large that the sieve method would take a very, very long time to complete).

Here's a benchmark for is_prime, perhaps the most important one. This is getting random primes, so is the worst case for performance. An apples-to-apples comparison is Math::Primality::is_prime vs. Math::Prime::Util::is_prob_prime as those will both do BPSW for large inputs. Here is a benchmark with just those two:

#!/usr/bin/env perl
use strict;
use warnings;
use Math::Prime::Util;
use Math::Prime::Util::GMP;
use Math::Primality;
use Benchmark qw/:all/;
my $count = shift || -2;
srand(29);  # So we have repeatable results

test_at_digits($_, 1000) for (5, 15, 25, 50, 200);

sub test_at_digits {
  my($digits, $numbers) = @_;
  die "Digits must be > 0" unless $digits > 0;

  my @nums = map { Math::Prime::Util::random_ndigit_prime($digits) } 1 .. $numbers;
  print "is_prime for $numbers random $digits-digit primes\n";

  cmpthese($count,{
    'MP'   => sub { Math::Primality::is_prime($_) for @nums; },
    'MPU'  => sub { Math::Prime::Util::is_prob_prime($_) for @nums; },
  });
}

For sizes larger than the native ints (e.g. 20 digits for 64-bit), the MPU code will call the GMP code if it's found, and the PP code otherwise.

MPU has three functions for primality (four if you count AKS):

is_prob_prime. Runs deterministic M-R or BPSW.
is_prime. As above, but for bit ranges 65-200 it runs a short primality test (BLS75 in the current code).
is_provable_prime. Runs a long primality proof (BLS75 in the current code). For small sizes there is a little difference in what is_prime and is_prob_prime actually end up doing, which is something that should be addressed. Given the above, it turns out is_prime will be quite a bit slower on average than is_prob_prime when given numbers in the 65-200 range (depending on the input -- it's the "hard" primes that take the time). About 90% of 25-digit primes return a 2 (definitely prime) instead of a 1 (probably prime) with the current code, and 10% of 50-digit primes.

Here are the results on one of my machines:

perl bench-mp-isprime-simple.pl -5
is_prime for 1000 random 5-digit numbers
      Rate    MP   MPU
MP  20.9/s    --  -95%
MPU  410/s 1864%    --
is_prime for 1000 random 15-digit numbers
      Rate    MP   MPU
MP  2.92/s    --  -98%
MPU  117/s 3912%    --
is_prime for 1000 random 25-digit numbers
      Rate   MP  MPU
MP  1.81/s   -- -72%
MPU 6.36/s 252%   --
is_prime for 1000 random 50-digit numbers
    s/iter   MP  MPU
MP    1.16   -- -76%
MPU  0.275 322%   --
is_prime for 1000 random 200-digit numbers
    s/iter   MP  MPU
MP    6.26   -- -59%
MPU   2.56 145%   --

Native ints are very fast in the XS code, as expected. That was one of the original performance goals of MPU. They'd be even faster if the '-nobigint' flag was set, as that routes is_prime and is_prob_prime straight to the XS code. For bigints we're not too far off.

This is using the GMP code. To test the PP code, the simplest way is to run with "MPU_NO_GMP=1 perl bench-mp-isprime-simple.pl" which will make MPU never run the GMP code. It can also be disabled/enabled via prime_set_config() from code. My initial tests are showing the PP code is quite slow -- about 30x slower than Math::Primality.

Here's a next_prime benchmark. The time consuming section is, of course, the is_prime call. However there are some optimizations that can help a little, mainly using a mod wheel rather than +2 for the loop. Using a simple GCD check for is_prime is an easier way to get most of the mod-wheel benefit (see text patch below).

#!/usr/bin/env perl
use strict;
use warnings;
use Math::Prime::Util;
use Math::Prime::Util::GMP;
use Math::Primality;
use Benchmark qw/:all/;
my $count = shift || -2;
srand(29);  # So we have repeatable results

test_at_digits($_, 1000) for (5, 15, 25, 50, 200);

sub test_at_digits {
  my($digits, $numbers) = @_;
  die "Digits must be > 0" unless $digits > 0;

  my $start = Math::Prime::Util::random_ndigit_prime($digits) - 3;
  my $end = $start;
  $end = Math::Prime::Util::GMP::next_prime($end) for 1 .. $numbers;

  print "next_prime x $numbers starting at $start\n";

  cmpthese($count,{
    'MP'       => sub { my $n = $start; $n = Math::Primality::next_prime($n) for 1..$numbers; die "MP ended with $n instead of $end" unless $n == $end; },
    'MPU'      => sub { my $n = $start; $n = Math::Prime::Util::next_prime($n) for 1..$numbers; die "MPU ended with $n instead of $end" unless $n == $end; },
    'MPU GMP'  => sub { my $n = $start; $n = Math::Prime::Util::GMP::next_prime($n) for 1..$numbers; die "MPU GMP ended with $n instead of $end" unless $n == $end; },
  });
}

Results on my machine:

next_prime x 1000 starting at 54718
          Rate      MP MPU GMP     MPU
MP      5.96/s      --    -98%    -99%
MPU GMP  343/s   5647%      --    -34%
MPU      518/s   8598%     51%      --
next_prime x 1000 starting at 763924481981150
          Rate      MP MPU GMP     MPU
MP      1.27/s      --    -94%    -97%
MPU GMP 22.7/s   1693%      --    -53%
MPU     48.1/s   3699%    112%      --
next_prime x 1000 starting at 5063098476252748493367014
           Rate      MP     MPU MPU GMP
MP      0.716/s      --    -85%    -93%
MPU      4.66/s    551%      --    -54%
MPU GMP  10.1/s   1304%    116%      --
next_prime x 1000 starting at 52698210161700843030674538955040782577098553333498
           Rate      MP     MPU MPU GMP
MP      0.289/s      --    -86%    -90%
MPU      2.12/s    632%      --    -25%
MPU GMP  2.83/s    880%     34%      --
next_prime x 1000 starting at 43711076896251176629882420736706185855760264586666703739341259591240875875823830182149605724144704550069806593516104667043587617311863414123245402779574320554275380829061695411494828618394096273735264
        s/iter      MP     MPU MPU GMP
MP        87.3      --    -81%    -81%
MPU       16.8    420%      --     -3%
MPU GMP   16.3    434%      3%      --

The following change to is_prime is a simple way to improve the speed of Math::Primality on this benchmark. For the 200-digit case it is about 3x faster than the numbers above.

    return 0 if Rmpz_cmp_ui($n, 2) == -1;
    return _is_small_prime($n) if Rmpz_cmp_ui($n, 257) == -1;
    # Check for small divisors, all primes <= 53.
    return 0 if Rmpz_even_p($n);
    return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 4127218095);
    return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 3948078067);

    if ( Rmpz_cmp_ui($n, 9_080_191) == -1 ) {
    ...

Something I found when working on speeding up random primes was that GMP is really fast at big GCDs (but Calc/FastCalc is very slow). I had it precalculate some big primorials then could use those before running the BPSW test. I'm not sure if that would be advantageous here or not. My brief experiment with trying it in both Math::Primality and Math::Prime::Util::GMP ended up with no success.

Edit: Note that the change above is great for speeding up is_prime when given non-primes, but does nothing for primes. Hence we won't see any improvement in the is_prime benchmark I wrote earlier since it is just checking primes. I suppose that means we should have a is_prime_for_composites benchmark.

Here is a strong pseudoprime benchmark.

#!/usr/bin/env perl
use strict;
use warnings;
use Math::Prime::Util;
use Math::Prime::Util::GMP;
use Math::Primality;
use Benchmark qw/:all/;
use List::Util qw/min max/;
my $count = shift || -2;
srand(29);  # So we have repeatable results

test_at_digits($_, 1000) for (5, 15, 25, 50, 200);

sub test_at_digits {
  my($digits, $numbers) = @_;
  die "Digits must be > 0" unless $digits > 0;

  # We get a mix of primes and non-primes.
  my @nums = map { Math::Prime::Util::random_ndigit_prime($digits)+2 } 1 .. $numbers;
  print "is_strong_pseudoprime for $numbers random $digits-digit numbers",
        " (", min(@nums), " - ", max(@nums), ")\n";

  cmpthese($count,{
    'MP'      =>sub {Math::Primality::is_strong_pseudoprime($_,3) for @nums;},
    'MPU'     =>sub {Math::Prime::Util::is_strong_pseudoprime($_,3) for @nums;},
    'MPU PP'  =>sub {Math::Prime::Util::PP::miller_rabin($_,3) for @nums;},
    'MPU GMP' =>sub {Math::Prime::Util::GMP::is_strong_pseudoprime($_,3) for @nums;},
  });
}

I decided to mix things up, so I add 2 to the random prime, which means we'll get a mix of mostly non-primes and some primes. It's only a single base being tested. By the way, random_ndigit_prime returns a native int if the number of digits is within the native int rage, and returns a Math::BigInt object otherwise.

And results:

is_strong_pseudoprime for 1000 random 5-digit numbers (10063 - 99993)
          Rate      MP  MPU PP MPU GMP     MPU
MP      45.2/s      --    -54%    -82%    -93%
MPU PP  99.2/s    119%      --    -61%    -85%
MPU GMP  256/s    467%    158%      --    -60%
MPU      565/s   1323%    549%    151%      --
is_strong_pseudoprime for 1000 random 15-digit numbers (101298912114595 - 999453353593759)
           Rate  MPU PP      MP MPU GMP     MPU
MPU PP  0.769/s      --    -98%   -100%   -100%
MP       42.6/s   5443%      --    -78%    -87%
MPU GMP   191/s  24745%    348%      --    -41%
MPU       305/s  42107%    661%     70%      --
is_strong_pseudoprime for 1000 random 25-digit numbers (1023600370930352897691105 - 9995238436435100598356005)
          Rate  MPU PP     MPU      MP MPU GMP
MPU PP  2.35/s      --    -80%    -92%    -97%
MPU     12.0/s    411%      --    -59%    -84%
MP      29.2/s   1145%    144%      --    -60%
MPU GMP 73.0/s   3007%    508%    150%      --
is_strong_pseudoprime for 1000 random 50-digit numbers (10187000487707708106502152256879883402722669237225 - 99856022767059793446794929540077869781429352292959)
          Rate  MPU PP     MPU      MP MPU GMP
MPU PP  2.29/s      --    -78%    -90%    -94%
MPU     10.6/s    365%      --    -53%    -74%
MP      22.7/s    891%    113%      --    -45%
MPU GMP 41.5/s   1712%    290%     83%      --
is_strong_pseudoprime for 1000 random 200-digit numbers (10011672718346771898228974819049965890420803166722296784710682861665401616928204604471919316787422973814237139976434276054382671992256028522103978156073302125640092256173018102100748492068416545793555 - 99918933776454016938856054302994034478796098902518548932205942989254875358823814247210476916963466935878225870132437175598108020181607598710974950435523064877666217505422473963923642564513010751469313)
          Rate  MPU PP     MPU      MP MPU GMP
MPU PP  1.35/s      --    -45%    -52%    -55%
MPU     2.44/s     81%      --    -13%    -18%
MP      2.81/s    109%     15%      --     -6%
MPU GMP 2.99/s    122%     23%      6%      --

I got a few things out of this:

MPU is excelling on small integers. This is also an x86_64 machine so we have nice fast mulmods. This was its original target audience, and not Math::Primality's target audience.
The MPU pure Perl code is fine for small numbers, but chokes once it's past half-word inputs (so 32-bit on this machine). Fortunately it isn't so bad on very large numbers (assuming we have Math::BigInt::GMP). It looks like I'd be better off by just switching to BigInts, but I bet the other BigInt backends wouldn't look so good.
Math::Primality is pretty good once well into BigInts. The two codes are nearly equal in speed for 200+ digits (I tested with 500 digits and they're basically tied).
The large difference between MPU and MPUGMP shows that Math::Prime::Util is spending far more time validating the input and dinking around with calling the right routine, than it is actually performing the M-R test. I got out NYTProf and made some changes that help. It still spends more time playing with the input than it does doing the math, but it's ~ 4:1 instead of 20:1.

The only thing I could think of to change for Math::Primality here is the usual:

  $base   = GMP->new("$base") if ref($base) ne 'Math::GMPz';
  $n      = GMP->new("$n") if ref($n) ne 'Math::GMPz';

which is the same thing we did in is_prime. It doesn't help on this test, but it could save a little bit for is_prime, next_prime, etc.

Can you explain or give a reference for this code?:

    return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 4127218095);
    return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 3948078067);

4127218095: 3 5 7 11 13 17 19 23 37 3948078067: 29 31 41 43 47 53

We've already done 2 using even_p, so we could do:

  do { return 0 if Rmpz_divisible_p($n, $_) } for (3,5,7,11,13,17,19,23,29,31,37,41,43,47,53);

for some trial division. Pick however far you want to go (we sent all inputs < 257 to _is_small_prime, so as long as we stay under 257, we shouldn't have to check $n == the prime).

Alternately do a GCD with the product of those primes and check that the result is 1. GMP is quite fast at doing the GCD, and this is a single call into GMP instead of 15.

  # 16294579238595022365: 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53
  return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 16294579238595022365);

However, that only works on 64-bit machines. So I split it into two 32-bit values. I agree a comment would be helpful.

You could do:

  return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 3*5*7*11*13*17*19*23*29);
  return 0 unless 1 == Rmpz_gcd_ui($Math::GMPz::NULL, $n, 31*37*41*43*47);

and lose very little while making it clearer. Include 53 and rearrange the terms if you wish.

We now have benchmarks in the bench/ directory! Woot!

leto / math--primality

Add a benchmark test against Math::Prime::Util #6