Closed ohanar closed 10 years ago
Attachment: trac_11475.patch.gz
based on sage-4.7
Description changed:
---
+++
@@ -81,10 +81,12 @@
625 loops, best of 3: 304 ns per loop
-This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range
is currently built on top of pari, which poses a number of problems, but primary concern it is a memory hog when sieving to a large upper bound (important for plotting purposes).
+This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range
is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
I know that this implementation fails to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I have been unable to determine. I have limited input to be < 2**49
, which I am fairly confident to be safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
A lot of the speed of this algorithm relies on the __cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.
Finally, I have introduced the legendres_formula
method, which takes two arguments, x
and a
, and returns the number of positive integers less than x
that are not divisible by the first a
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
+
+Apply trac_11475.patch
The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )
Haven't yet looked a the rest...
Changed author from ohanar to R. Andrew Ohana
A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes)
Something like this should give you a 10% speedup for big inputs.
if p < 200:
pos = p>>2
size = 5
else:
size = self.__numPrimes>>1
# Use the expected density of primes for expected inputs to make an
# educated guess
if p>>3 < size:
size = p>>3
pos = size
(beware to make sure there are enough primes cached for small inputs)
Replying to @nexttime:
The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )
Sorry about that, I was planning on moving the folder, but I forgot to. I'll fix that tonight.
Replying to @sagetrac-ylchapuy:
Something like this should give you a 10% speedup for big inputs.
Thanks, I'll try it out tonight, and update the description and patch if it proves fruitful.
Description changed:
---
+++
@@ -34,7 +34,7 @@
sage: timeit('prime_pi(100)')
625 loops, best of 3: 331 ns per loop
sage: timeit('prime_pi(1000)')
-625 loops, best of 3: 963 ns per loop
+625 loops, best of 3: 838 ns per loop
Old:
@@ -45,18 +45,18 @@ 625 loops, best of 3: 2.72 µs per loop
-There is also 25-40% speedup for larger input:
+There is also 35-50% speedup for larger input:
New:
sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.69 ms per loop +125 loops, best of 3: 2.5 ms per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 127 ms per loop +5 loops, best of 3: 116 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 6.80 s, Wall: 6.80 s +Time: CPU 6.30 s, Wall: 6.29 s
Old:
@@ -70,23 +70,23 @@
Time: CPU 8.65 s, Wall: 8.64 s
-Primes are now cached, which can give a huge speedup when making smaller calls after larger calls: (run after previous commands) +Primes are now cached, which can give a significant speedup when making smaller calls after larger calls: (run after previous commands)
sage: timeit('prime_pi(10**10)')
-5 loops, best of 3: 64.6 ms per loop
+5 loops, best of 3: 59.6 ms per loop
sage: timeit('prime_pi(10**8)')
-625 loops, best of 3: 721 µs per loop
+625 loops, best of 3: 663 µs per loop
sage: timeit('prime_pi(1000)')
625 loops, best of 3: 304 ns per loop
This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range
is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
-I know that this implementation fails to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I have been unable to determine. I have limited input to be < 2**49
, which I am fairly confident to be safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
+I know that this implementation starts to fail to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I have been unable to determine. I have limited input to be < 2**49
, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
-A lot of the speed of this algorithm relies on the __cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.
+A lot of the speed of this algorithm relies on the __cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)
Finally, I have introduced the legendres_formula
method, which takes two arguments, x
and a
, and returns the number of positive integers less than x
that are not divisible by the first a
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
-Apply trac_11475.patch +Apply trac_11475v2.patch
Makes use of Yann's suggested improvement.
Attachment: trac_11475v2.patch.gz
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?
(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)
Replying to @nexttime:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
I did most of my testing on mod.math and sage.math, which is ubuntu linux (64-bit). I don't have failing results any longer, but I can tell you that I discovered the issue with 10**16
, and I also know that it fails for 9.75*10**15
. Give me 7-8 hours and I'll get you the output for those inputs.
While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?
I don't know how the new one behaves on 32-bit systems, since I have done literally no testing. I hope it does better than the old one, since I used stdint this time around (and the only real difference between my 64-bit vs 32-bit implementation is the use of pointers in the later).
(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)
Apply only trac_11475v2.patch
Description changed:
---
+++
@@ -89,4 +89,6 @@
Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers less than `x` that are not divisible by the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
-Apply trac_11475v2.patch
+---
+
+Apply only [attachment: trac_11475v2.patch](https://github.com/sagemath/sage-prod/files/10653115/trac_11475v2.patch.gz).
Description changed:
---
+++
@@ -87,7 +87,7 @@
A lot of the speed of this algorithm relies on the `__cached_count` method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)
-Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers less than `x` that are not divisible by the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
+Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers not larger than `x` that are not divisible by -- i.e. co-prime to -- each of the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
---
Attachment: trac_11475v3.patch.gz
Fixes doctests, and changed __cached_primes again to improve speed
Apply only trac_11475v3.patch
Description changed:
---
+++
@@ -34,7 +34,7 @@
sage: timeit('prime_pi(100)')
625 loops, best of 3: 331 ns per loop
sage: timeit('prime_pi(1000)')
-625 loops, best of 3: 838 ns per loop
+625 loops, best of 3: 782 ns per loop
Old:
@@ -45,18 +45,18 @@ 625 loops, best of 3: 2.72 µs per loop
-There is also 35-50% speedup for larger input:
+There is also a 45-60% speedup for larger input:
New:
sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.5 ms per loop +125 loops, best of 3: 2.31 ms per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 116 ms per loop +5 loops, best of 3: 108 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 6.30 s, Wall: 6.29 s +Time: CPU 5.74 s, Wall: 5.75 s
Old:
@@ -74,11 +74,11 @@
sage: timeit('prime_pi(1010)') -5 loops, best of 3: 59.6 ms per loop +5 loops, best of 3: 54.6 ms per loop sage: timeit('prime_pi(108)') -625 loops, best of 3: 663 µs per loop +625 loops, best of 3: 604 µs per loop sage: timeit('prime_pi(1000)') -625 loops, best of 3: 304 ns per loop +625 loops, best of 3: 342 ns per loop
This doesn't cost very much memory, since everything is stored as 32-bit ints, but `prime_range` is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
@@ -91,4 +91,4 @@
---
-Apply only [attachment: trac_11475v2.patch](https://github.com/sagemath/sage-prod/files/10653115/trac_11475v2.patch.gz).
+Apply only [attachment: trac_11475v3.patch](https://github.com/sagemath/sage-prod/files/10653116/trac_11475v3.patch.gz).
Replying to @nexttime:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s
The correct value is 272450167482953
(http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16)
but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).
Replying to @ohanar:
Replying to @nexttime:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s
The correct value is
272450167482953
(http://www.ieeta.pt/~tos/primes.html). I also computedprime_pi(10**16)
but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).
Never mind. I thought you had collected a bunch of false results during development. If (apparently) all platforms are affected, it should be easier to track this down - rather than flaws due to e.g. Apple "specifics"... ;-)
Btw, I have a large collection of correct prime counts here, so if eventually the sysloads drop, I'll by myself check some results against your new implementation.
Anyway, anybody feel free to compute more values above our current limit and post them here, whether correct or not. :-)
Attachment: trac_11475v4.patch.gz
Some major speed improvements using ~4kB of tables
Description changed:
---
+++
@@ -33,30 +33,30 @@
sage: timeit('prime_pi(100)') 625 loops, best of 3: 331 ns per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 782 ns per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 971 ns per loop
Old:
sage: timeit('prime_pi(100)') 625 loops, best of 3: 1.71 µs per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 2.72 µs per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 5.56 µs per loop
-There is also a 45-60% speedup for larger input:
+There is also a ~3x speedup for larger input:
New:
sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.31 ms per loop +125 loops, best of 3: 759 µs per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 108 ms per loop +5 loops, best of 3: 47.4 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 5.74 s, Wall: 5.75 s +Time: CPU 2.86 s, Wall: 2.86 s
Old:
@@ -74,11 +74,11 @@
sage: timeit('prime_pi(1010)') -5 loops, best of 3: 54.6 ms per loop +5 loops, best of 3: 27.4 ms per loop sage: timeit('prime_pi(108)') -625 loops, best of 3: 604 µs per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 342 ns per loop +625 loops, best of 3: 325 µs per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 343 ns per loop
This doesn't cost very much memory, since everything is stored as 32-bit ints, but `prime_range` is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
@@ -91,4 +91,4 @@
---
-Apply only [attachment: trac_11475v3.patch](https://github.com/sagemath/sage-prod/files/10653116/trac_11475v3.patch.gz).
+Apply only [attachment: trac_11475v4.patch](https://github.com/sagemath/sage-prod/files/10653117/trac_11475v4.patch.gz).
An update:
v4: added a table of prime_pi
values that can be stored in a byte -- using this with the __cached_primes
method gave a giant speedup. Since the sum at the beginning of __phi
(and __phi32
) differs from 16*(x/77)
by a small factor that only depends on x%2310
, so adding a table with these factors results in another large speedup. These tables take a total of ~4kB.
v5: the tables introduced a significant initialization time for prime_pi
, and William asked me to fix this, so prime_pi
(and legendres_formula) only initialize these tables on an initial call -- so initialization time is now ~650ns:
sage: timeit('sage.functions.prime_pi.PrimePi()')
625 loops, best of 3: 627 ns per loop
Both Yann and I have tested 9.75*10^15
again and got the correct result, I also did 10^16
. We both ran it on 64-bit GNU/Linux systems. So the issue I was encountering may have been due to boundary issues with the __cached_primes
method (which have been rewritten over the course of the patches). I kept the limit in since I haven't yet tested this thoroughly.
Attachment: trac_11475v5.patch.gz
Reduced initialization time
Description changed:
---
+++
@@ -91,4 +91,4 @@
---
-Apply only [attachment: trac_11475v4.patch](https://github.com/sagemath/sage-prod/files/10653117/trac_11475v4.patch.gz).
+Apply only [attachment: trac_11475v5.patch](https://github.com/sagemath/sage-prod/files/10653118/trac_11475v5.patch.gz).
I've so far only tested a little (on 64-bit Linux; dead old 32-bit box still building Sage), but already found one wrong result with patch v3:
502000000000000 15296923904889 # 11475v3
502000000000000 15296922045596 # correct result, (at least) double-checked
Note that 5021012 is below* 249 (562.949.953.421.312).
As my machines are still busy, you could perhaps check if the newer version(s) give the correct result.
Works on sage.math with 11475v5:
sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s
Replying to @ohanar:
Works on sage.math with 11475v5:
sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s
Yep, here too; with both v4 and v5 (64-bit Linux).
580000000000000 17596211452162
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
581000000000000 0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
582000000000000 0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
583000000000000 0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
584000000000000 0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
585000000000000 0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
586000000000000 0
Hmm, really?
Replying to @nexttime:
580000000000000 17596211452162 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 581000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 582000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 583000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 584000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 585000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 586000000000000 0
Hmm, really?
Oops, thankfully easily fixed. Shouldn't occur in the next patch (whenever that is).
Ok, here are my first 32-bit tests (with patch v5):
sage -t "devel/sage-11475v5/sage/functions/prime_pi.pyx"
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
sage: prime_pi(10**10)
Expected:
455052511
Got:
498132213
**********************************************************************
1 items had failures:
1 of 14 in __main__.example_1
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
[6.1 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 6.1 seconds
real 0m6.293s
user 0m5.328s
sys 0m0.760s
With long tests included, some unhandled overflows occur:
sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
sage: prime_pi(10**10)
Expected:
455052511
Got:
498132213
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 231:
sage: for n in (42,41..32): prime_pi(2**n) # long time (21s on mod.math, 2011)
Expected:
156661034233
80316571436
41203088796
21151907950
10866266172
5586502348
2874398515
1480206279
762939111
393615806
203280221
Got:
2072485352
555689055
4249350341
1916569322
829105263
2484680881
3345496393
1649125207
813719441
403824770
203621732
**********************************************************************
2 items had failures:
1 of 14 in __main__.example_1
1 of 14 in __main__.example_4
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
[52.1 s]
----------------------------------------------------------------------
The following tests failed:
sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 52.2 seconds
real 0m52.318s
user 0m49.923s
sys 0m0.956s
This is on Ubuntu 7.10 :) (P4 Northwood, gcc 4.2.1) with Sage 4.7.
Unfortunate, but kind of expected. It'll probably take a day or two for me to set up a 32-bit vm to try to debug the issue (it isn't blatantly obvious to me what is going wrong).
Replying to @ohanar:
(it isn't blatantly obvious to me what is going wrong).
Yep, especially the "long int doesn't fit into int" reminds me of 16-bit systems... 8/
fixes issue on 32-bit systems
Description changed:
---
+++
@@ -91,4 +91,4 @@
---
-Apply only [attachment: trac_11475v5.patch](https://github.com/sagemath/sage-prod/files/10653118/trac_11475v5.patch.gz).
+Apply only [attachment: trac_11475v6.patch](https://github.com/sagemath/sage-prod/files/10653119/trac_11475v6.patch.gz).
Attachment: trac_11475v6.patch.gz
Hopefully that was the only issue with 32-bit systems. Basically gmp wants a 32-bit int for mpz_set_ui on 32-bit systems, so I gave it what it wanted (this was also causing the issue with the __sieve
function, since I use the mpz_sqrt
function to calculate the argument).
I am currently doing some more thorough testing with large values on a 64-bit machine, so that we can be sure there isn't any more breakage on the stable platform.
Replying to @ohanar:
Hopefully that was the only issue with 32-bit systems.
v6 passes long tests on the 32-bit Ubuntu 7.10 machine (in about a minute). More to come.
Apply only trac_11475v6.patch
I have verified that prime_pi
gives correct output for i*10**12
with 0 <= i <= 1000
on x86_64 :). I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Unfortunately, I have also come across an issue with solaris 10 sparc (t2.math). The root of the cause is that mpz_export
has the bit ordering arguments flipped. I am not aware of another way to get unsigned ints out of gmp integers, nor am I aware of a way to check the platform from cython. Also, I don't know if other platforms have this same issue (solaris86 or mac powerpc in particular), so if someone could test that, please let me know.
Attachment: trac_11475v8.patch.gz
100% doctest coverage + slightly faster init + solaris/sparc fix
Apply only trac_11475v8.patch
Replying to @ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).
Besides a couple of other values on individual machines, I've successfully computed pi(2k) for k=0...53 and pi(10k) for k=0...16 on each of three different machines (two of them 32-bit; all with patch version 6).
Some computations are still in progress, but so far I could also verify the results of prime_pi(2**54)
and prime_pi(2**55)
(on at least one machine). I've also played a little with "legendres_formula()
".
I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...
I wonder if legendres_formula()
shouldn't be renamed to legendre_phi()
- better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi
would be analogous to e.g. euler_phi
in Sage. (I know Wolfram and others call "it" - i.e., IMHO the method computing the respective function rather than the function itself - Legendre's Formula as well, but it's not that unambiguous.)
I'd also mention in its docstring it's called the partial sieve function.
Replying to @nexttime:
Replying to @ohanar:
I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...
Ah, I see, you already removed the restriction in patch v8 (and also added warnings w.r.t. seemingly "endless" computations, which I also wanted to suggest).
There's a small typo in __call__()
's docstring (the 2**32
should read 2**64
):
This implementation uses unsigned 64-bit ints and thus does not
support `x >= 2**32`::
sage: prime_pi(2^64)
Traceback (most recent call last):
...
NotImplementedError: computation of prime_pi for x >= 2^64 is not implemented
Also, there's a mixture of both kinds of exponentiation operators (^
and **
) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.
Replying to @nexttime:
Replying to @ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).Besides a couple of other values on individual machines,
Which machines have you had issues with? Everything I have tested with v8 has worked (this includes x86 and x86_64 linux, 64bit Snow Leopard, and Sparc Solaris 10).
I wonder if
legendres_formula()
shouldn't be renamed tolegendre_phi()
- better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, andlegendre_phi
would be analogous to e.g.euler_phi
in Sage.
Makes sense to me, if no one has an objection to the idea, I'll make the change.
I'd also mention in its docstring it's called the partial sieve function.
Sounds good to me.
Replying to @nexttime:
Also, there's a mixture of both kinds of exponentiation operators (
^
and**
) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.
Will do -- probably ^
since it is more consumer friendly.
Replying to @ohanar:
Replying to @nexttime:
Replying to @ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).Besides a couple of other values on individual machines,
Which machines have you had issues with?
Oh, I didn't get any wrong results; I just meant I've also tested prime_pi(x)
with some "random" values (besides the mentioned powers of 2 and ten), but not systematically on all machines (all Linux boxes; Pentium4 Northwood, P4 Prescott and Core2).
Computation of prime_pi(2**56)
just finished with the correct result on one of the 32-bit machines :-) (took 2 days+).
[Updated description w.r.t. (now obsolete) limitation / potentially wrong results for inputs exceeding 249 with earlier versions.]
Description changed:
---
+++
@@ -81,14 +81,14 @@
625 loops, best of 3: 343 ns per loop
-This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range
is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
+This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range()
is currently built on top of PARI, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
-I know that this implementation starts to fail to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I have been unable to determine. I have limited input to be < 2**49
, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
+An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I had previously been unable to determine, so I had initially limited input to be < 2**49
, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
A lot of the speed of this algorithm relies on the __cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)
-Finally, I have introduced the legendres_formula
method, which takes two arguments, x
and a
, and returns the number of positive integers not larger than x
that are not divisible by -- i.e. co-prime to -- each of the first a
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
+Finally, I have introduced the legendres_formula
method, which takes two arguments, x
and a
, and returns the number of positive integers not larger than x
that are not divisible by any -- i.e. co-prime to each -- of the first a
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
-Apply only attachment: trac_11475v6.patch. +Apply only attachment: trac_11475v8.patch.
Attachment: trac_11475v9.patch.gz
See comment #40
So through the updates, I haven't been keeping legendres_formula
up to date. I am posting a patch that should fix the bugs that were present for the past few versions (primarily, calling it with a relatively small x
in comparison to a
would give an extremely wrong answer due to overflow issues). I have also changed the name, so it now is called either legendre_phi
or partial_sieve_function
.
The patch also makes all documentation use ^
, there are no longer any instances of **
.
The patch also touches up on __phi
and __phi32
-- cleaning up code, adding a few comments, and makes a simple optimization to reduce boolean checks for small x
values (~10% speedup for large input).
I'm hoping this will be the final patch for this ticket, so that I don't go into the double digits for version numbers :-).
Apply only trac_11475v9.patch
I have rewritten the
prime_pi
method from scratch, it is much cleaner and less hacky this time (as I have learned more about coding), however it is still based on the same algorithm as before (no LMO yet, although I intend to take a stab at it later this year). This was developed in parallel with a method to count primes in residue classes (ticket forthcoming).The overhead for small input
>= 2
is rather large in the currentprime_pi
, this has been dramatically improved:New:
Old:
There is also a ~5x speedup (conservatively) for larger input:
New:
Old:
The user can also give a second argument to use additional memory for a potential speedup. All primes less than the second argument are used in computing pi(x). For example:
An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between
9.5*10**15
and9.75*10**15
, for issues that I had previously been unable to determine, so I had initially limited input to be< 2**49
, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.A lot of the speed of this algorithm relies on the
__cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.The other big contributor to the speed is the
__smallPi
table, which stores the values of pi(x) forx < 2**16
. This is used in the same manner as__cached_count
, but is of course much faster.Finally, I have introduced the
legendres_formula
method, which takes two arguments,x
anda
, and returns the number of positive integers not larger thanx
that are not divisible by any -- i.e. co-prime to each -- of the firsta
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.Apply attachment: trac_11475mk2.patch and attachment: trac_11475-reviewer.patch to the Sage library.
Depends on #15138
CC: @williamstein @nexttime @sagetrac-kevin-stueve @sagetrac-victor @kcrisman
Component: number theory
Keywords: primes, prime counting, prime_pi sd40.5
Author: R. Andrew Ohana
Branch/Commit:
e3e33e6
Reviewer: Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman
Issue created by migration from https://trac.sagemath.org/ticket/11475