perlancar / perl-Bencher-Scenario-MathPrimeModules

0 stars 0 forks source link

Accuracy of results #2

Closed danaj closed 7 years ago

danaj commented 7 years ago

Running bencher -m MathPrimeModules --exclude-dataset-names 100 we get results for generating the 78498 primes less than 1M. The documentation shows this takes 13.2 milliseconds for Acme::PERLANCAR::Prime::primes, and is 20ms on my Mac. In fact giving quite large values still shows it running reasonably fast compared to the XS sieves, which seems unlikely given the code.

I believe this is an artifact of the Acme module caching the results. Adding _empty_cache() results in very different results. The time for 1M goes from 20ms to 2300ms.

Arguably other modules are doing similar things. The current implementation for Math::Prime::XS does not. If a new Math::Prime::FastSieve object is made every invocation, then it does not. Math::Prime::Util does for primes() up to about 2 million. Using Math::Prime::Util::erat_primes(2,<num>) will do a new monolithic sieve each time. Using Math::Prime::Util::segment_primes(2,<num>) will do a new segmented sieve each time (it does cache the aux primes, but the initial cache already has enough to sieve to 14 billion or so, so it wouldn't matter for these benchmarks).

$ time perl -MAcme::PERLANCAR::Prime=primes -E 'say for primes(1000000)' | wc -l
   78498

real    0m2.393s
user    0m2.314s
sys 0m0.036s

$ time perl -MMath::Prime::XS=primes -E 'say for primes(1000000)' | wc -l
   78498

real    0m0.065s
user    0m0.056s
sys 0m0.011s

$ time perl -Mntheory=primes -E 'say for @{primes(1000000)}' | wc -l
   78498

real    0m0.057s
user    0m0.048s
sys 0m0.009s

For 10M the times are 44.3s, 0.35s, and 0.29s respectively. Without printing it is 44.1s, 0.11s, and 0.047s respectively. 0.77s for Math::Prime::Util's PP sieve.

The benchmark says 163ms, 80ms, 30ms respectively for 10M, which is significantly different than the time for one invocation.

perlancar commented 7 years ago

Hi Dana, thanks for the comments. I tried changing Math::Prime::Util to use erat_primes, but the result (about twice as fast as Math::Prime::FastSieve) indicates that it's still caching.

perlancar commented 7 years ago

Current result with just 3 participants that are now non-caching:

% bencher -Ilib -m MathPrimeModules 
# Run on: perl v5.24.0, CPU Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (4 cores), OS GNU/Linux Debian version 8.0, OS kernel: Linux version 3.16.0-4-amd64
# Elapsed time: 19.10s
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
| participant                    | dataset | rate (/s) |       time | vs_slowest |  errors   | samples |
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
| Acme::PERLANCAR::Prime         | 1000000 |      0.37 | 2.7        |          1 |   0.0066  |       6 |
| Math::Prime::XS::primes        | 1000000 |    100    | 0.01       |        300 |   0.00029 |       7 |
| Math::Prime::FastSieve::primes | 1000000 |    100    | 0.007      |        400 |   0.0001  |       7 |
| Acme::PERLANCAR::Prime         | 100     |  14500    | 0.0000687  |      39300 | 4.9e-08   |       6 |
| Math::Prime::XS::primes        | 100     | 100000    | 0.000009   |     300000 | 1.2e-07   |      11 |
| Math::Prime::FastSieve::primes | 100     | 118000    | 0.00000847 |     319000 | 5.3e-09   |       8 |
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
danaj commented 7 years ago

erat_primes doesn't do any caching. It should be faster than Math::Prime::FastSieve, as the sieve is much faster. MPFS uses a well-done 4-line odds-only SoE. MPU has many more optimizations (e.g. mod-30 so skips 3 and 5 in addition to 2, a prefill for 7,11,13, etc.). Part of the reason for keeping that private function was to give a way to force the sieve to be run.

For larger sizes (larger than 1e8), the segment sieve is faster yet, and isn't caching the full list of primes (just the ones to sqrt(n). Edit: The segment sieve will actually try to fill the whole thing from cache if it can, and loading the module fills the cache with primes up to 2M, so for such small sizes it wouldn't be a good test. This doesn't happen with larger inputs, and erat_primes is completely different.

perlancar commented 7 years ago

I see. I've re-enabled MPU and released the new version of the benchmark.