Closed danaj closed 7 years ago
Hi Dana, thanks for the comments. I tried changing Math::Prime::Util to use erat_primes, but the result (about twice as fast as Math::Prime::FastSieve) indicates that it's still caching.
Current result with just 3 participants that are now non-caching:
% bencher -Ilib -m MathPrimeModules
# Run on: perl v5.24.0, CPU Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (4 cores), OS GNU/Linux Debian version 8.0, OS kernel: Linux version 3.16.0-4-amd64
# Elapsed time: 19.10s
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
| participant | dataset | rate (/s) | time | vs_slowest | errors | samples |
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
| Acme::PERLANCAR::Prime | 1000000 | 0.37 | 2.7 | 1 | 0.0066 | 6 |
| Math::Prime::XS::primes | 1000000 | 100 | 0.01 | 300 | 0.00029 | 7 |
| Math::Prime::FastSieve::primes | 1000000 | 100 | 0.007 | 400 | 0.0001 | 7 |
| Acme::PERLANCAR::Prime | 100 | 14500 | 0.0000687 | 39300 | 4.9e-08 | 6 |
| Math::Prime::XS::primes | 100 | 100000 | 0.000009 | 300000 | 1.2e-07 | 11 |
| Math::Prime::FastSieve::primes | 100 | 118000 | 0.00000847 | 319000 | 5.3e-09 | 8 |
+--------------------------------+---------+-----------+------------+------------+-----------+---------+
erat_primes doesn't do any caching. It should be faster than Math::Prime::FastSieve, as the sieve is much faster. MPFS uses a well-done 4-line odds-only SoE. MPU has many more optimizations (e.g. mod-30 so skips 3 and 5 in addition to 2, a prefill for 7,11,13, etc.). Part of the reason for keeping that private function was to give a way to force the sieve to be run.
For larger sizes (larger than 1e8), the segment sieve is faster yet, and isn't caching the full list of primes (just the ones to sqrt(n). Edit: The segment sieve will actually try to fill the whole thing from cache if it can, and loading the module fills the cache with primes up to 2M, so for such small sizes it wouldn't be a good test. This doesn't happen with larger inputs, and erat_primes is completely different.
I see. I've re-enabled MPU and released the new version of the benchmark.
Running
bencher -m MathPrimeModules --exclude-dataset-names 100
we get results for generating the 78498 primes less than 1M. The documentation shows this takes 13.2 milliseconds for Acme::PERLANCAR::Prime::primes, and is 20ms on my Mac. In fact giving quite large values still shows it running reasonably fast compared to the XS sieves, which seems unlikely given the code.I believe this is an artifact of the Acme module caching the results. Adding _empty_cache() results in very different results. The time for 1M goes from 20ms to 2300ms.
Arguably other modules are doing similar things. The current implementation for Math::Prime::XS does not. If a new Math::Prime::FastSieve object is made every invocation, then it does not. Math::Prime::Util does for primes() up to about 2 million. Using
Math::Prime::Util::erat_primes(2,<num>)
will do a new monolithic sieve each time. UsingMath::Prime::Util::segment_primes(2,<num>)
will do a new segmented sieve each time (it does cache the aux primes, but the initial cache already has enough to sieve to 14 billion or so, so it wouldn't matter for these benchmarks).For 10M the times are 44.3s, 0.35s, and 0.29s respectively. Without printing it is 44.1s, 0.11s, and 0.047s respectively. 0.77s for Math::Prime::Util's PP sieve.
The benchmark says 163ms, 80ms, 30ms respectively for 10M, which is significantly different than the time for one invocation.