sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

improve prime_pi (speedup + small fixes) #11475

Closed ohanar closed 10 years ago

ohanar commented 13 years ago

I have rewritten the prime_pi method from scratch, it is much cleaner and less hacky this time (as I have learned more about coding), however it is still based on the same algorithm as before (no LMO yet, although I intend to take a stab at it later this year). This was developed in parallel with a method to count primes in residue classes (ticket forthcoming).

The overhead for small input >= 2 is rather large in the current prime_pi, this has been dramatically improved:

New:

sage: timeit('prime_pi(100)')
625 loops, best of 3: 475 ns per loop
sage: timeit('prime_pi(10000)')
625 loops, best of 3: 483 ns per loop

Old:

sage: timeit('prime_pi(100)')
625 loops, best of 3: 1.71 µs per loop
sage: timeit('prime_pi(10000)')
625 loops, best of 3: 5.56 µs per loop

There is also a ~5x speedup (conservatively) for larger input:

New:

sage: timeit('prime_pi(10**8)')
625 loops, best of 3: 530 µs per loop
sage: timeit('prime_pi(10**10)')
5 loops, best of 3: 27.1 ms per loop
sage: time prime_pi(10**12)
37607912018
Time: CPU 1.53 s, Wall: 1.53 s

Old:

sage: timeit('prime_pi(10**8)')
125 loops, best of 3: 3.66 ms per loop
sage: timeit('prime_pi(10**10)')
5 loops, best of 3: 161 ms per loop
sage: time prime_pi(10**12)
37607912018
Time: CPU 8.65 s, Wall: 8.64 s

The user can also give a second argument to use additional memory for a potential speedup. All primes less than the second argument are used in computing pi(x). For example:

sage: time prime_pi(10**12, 10**8)
37607912018
Time: CPU 0.92 s, Wall: 0.92 s
sage: time prime_pi(10**12, 10**8)    # pari's prime table is now cached
37607912018
Time: CPU 0.58 s, Wall: 0.58 s

An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I had previously been unable to determine, so I had initially limited input to be < 2**49, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.

A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.

The other big contributor to the speed is the __smallPi table, which stores the values of pi(x) for x < 2**16. This is used in the same manner as __cached_count, but is of course much faster.

Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers not larger than x that are not divisible by any -- i.e. co-prime to each -- of the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.


Apply attachment: trac_11475mk2.patch and attachment: trac_11475-reviewer.patch to the Sage library.

Depends on #15138

CC: @williamstein @nexttime @sagetrac-kevin-stueve @sagetrac-victor @kcrisman

Component: number theory

Keywords: primes, prime counting, prime_pi sd40.5

Author: R. Andrew Ohana

Branch/Commit: e3e33e6

Reviewer: Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman

Issue created by migration from https://trac.sagemath.org/ticket/11475

ohanar commented 13 years ago

Attachment: trac_11475.patch.gz

based on sage-4.7

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -81,10 +81,12 @@
 625 loops, best of 3: 304 ns per loop

-This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range is currently built on top of pari, which poses a number of problems, but primary concern it is a memory hog when sieving to a large upper bound (important for plotting purposes). +This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).

I know that this implementation fails to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I have been unable to determine. I have limited input to be < 2**49, which I am fairly confident to be safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.

A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.

Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers less than x that are not divisible by the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code. + +Apply trac_11475.patch

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:2

The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )

Haven't yet looked a the rest...

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago

Changed author from ohanar to R. Andrew Ohana

2c00b582-cfe9-43b9-8124-fec470060704 commented 13 years ago
comment:4

A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes)

Something like this should give you a 10% speedup for big inputs.

        if p < 200:
            pos = p>>2
            size = 5
        else:
            size = self.__numPrimes>>1
            # Use the expected density of primes for expected inputs to make an
            # educated guess
            if p>>3 < size:
                size = p>>3
            pos = size

(beware to make sure there are enough primes cached for small inputs)

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:5

Replying to @nexttime:

(I wouldn't call it 'main' btw. ;-) )

Oops, s/call/name/. :)

ohanar commented 13 years ago
comment:6

Replying to @nexttime:

The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )

Sorry about that, I was planning on moving the folder, but I forgot to. I'll fix that tonight.

Replying to @sagetrac-ylchapuy:

Something like this should give you a 10% speedup for big inputs.

Thanks, I'll try it out tonight, and update the description and patch if it proves fruitful.

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -34,7 +34,7 @@
 sage: timeit('prime_pi(100)')
 625 loops, best of 3: 331 ns per loop
 sage: timeit('prime_pi(1000)')
-625 loops, best of 3: 963 ns per loop
+625 loops, best of 3: 838 ns per loop

Old:

@@ -45,18 +45,18 @@ 625 loops, best of 3: 2.72 µs per loop


-There is also 25-40% speedup for larger input:
+There is also 35-50% speedup for larger input:

 New:

sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.69 ms per loop +125 loops, best of 3: 2.5 ms per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 127 ms per loop +5 loops, best of 3: 116 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 6.80 s, Wall: 6.80 s +Time: CPU 6.30 s, Wall: 6.29 s

 Old:

@@ -70,23 +70,23 @@
 Time: CPU 8.65 s, Wall: 8.64 s

-Primes are now cached, which can give a huge speedup when making smaller calls after larger calls: (run after previous commands) +Primes are now cached, which can give a significant speedup when making smaller calls after larger calls: (run after previous commands)

 sage: timeit('prime_pi(10**10)')
-5 loops, best of 3: 64.6 ms per loop
+5 loops, best of 3: 59.6 ms per loop
 sage: timeit('prime_pi(10**8)')
-625 loops, best of 3: 721 µs per loop
+625 loops, best of 3: 663 µs per loop
 sage: timeit('prime_pi(1000)')
 625 loops, best of 3: 304 ns per loop

This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).

-I know that this implementation fails to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I have been unable to determine. I have limited input to be < 2**49, which I am fairly confident to be safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms. +I know that this implementation starts to fail to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I have been unable to determine. I have limited input to be < 2**49, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.

-A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. +A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)

Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers less than x that are not divisible by the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.

-Apply trac_11475.patch +Apply trac_11475v2.patch

ohanar commented 13 years ago

Makes use of Yann's suggested improvement.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:8

Attachment: trac_11475v2.patch.gz

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?

(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)

ohanar commented 13 years ago
comment:9

Replying to @nexttime:

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

I did most of my testing on mod.math and sage.math, which is ubuntu linux (64-bit). I don't have failing results any longer, but I can tell you that I discovered the issue with 10**16, and I also know that it fails for 9.75*10**15. Give me 7-8 hours and I'll get you the output for those inputs.

While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?

I don't know how the new one behaves on 32-bit systems, since I have done literally no testing. I hope it does better than the old one, since I used stdint this time around (and the only real difference between my 64-bit vs 32-bit implementation is the use of pointers in the later).

(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)

robertwb commented 13 years ago
comment:10

Apply only trac_11475v2.patch

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:11

Replying to @robertwb:

Apply only trac_11475v2.patch

:D

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago

Description changed:

--- 
+++ 
@@ -89,4 +89,6 @@

 Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers less than `x` that are not divisible by the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.

-Apply trac_11475v2.patch
+---
+
+Apply only [attachment: trac_11475v2.patch](https://github.com/sagemath/sage-prod/files/10653115/trac_11475v2.patch.gz).
83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago

Description changed:

--- 
+++ 
@@ -87,7 +87,7 @@

 A lot of the speed of this algorithm relies on the `__cached_count` method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)

-Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers less than `x` that are not divisible by the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
+Finally, I have introduced the `legendres_formula` method, which takes two arguments, `x` and `a`, and returns the number of positive integers not larger than `x` that are not divisible by -- i.e. co-prime to -- each of the first `a` primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.

 ---
ohanar commented 13 years ago

Attachment: trac_11475v3.patch.gz

Fixes doctests, and changed __cached_primes again to improve speed

robertwb commented 13 years ago
comment:13

Apply only trac_11475v3.patch

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -34,7 +34,7 @@
 sage: timeit('prime_pi(100)')
 625 loops, best of 3: 331 ns per loop
 sage: timeit('prime_pi(1000)')
-625 loops, best of 3: 838 ns per loop
+625 loops, best of 3: 782 ns per loop

Old:

@@ -45,18 +45,18 @@ 625 loops, best of 3: 2.72 µs per loop


-There is also 35-50% speedup for larger input:
+There is also a 45-60% speedup for larger input:

 New:

sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.5 ms per loop +125 loops, best of 3: 2.31 ms per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 116 ms per loop +5 loops, best of 3: 108 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 6.30 s, Wall: 6.29 s +Time: CPU 5.74 s, Wall: 5.75 s

 Old:

@@ -74,11 +74,11 @@

sage: timeit('prime_pi(1010)') -5 loops, best of 3: 59.6 ms per loop +5 loops, best of 3: 54.6 ms per loop sage: timeit('prime_pi(108)') -625 loops, best of 3: 663 µs per loop +625 loops, best of 3: 604 µs per loop sage: timeit('prime_pi(1000)') -625 loops, best of 3: 304 ns per loop +625 loops, best of 3: 342 ns per loop


 This doesn't cost very much memory, since everything is stored as 32-bit ints, but `prime_range` is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
@@ -91,4 +91,4 @@

 ---

-Apply only [attachment: trac_11475v2.patch](https://github.com/sagemath/sage-prod/files/10653115/trac_11475v2.patch.gz).
+Apply only [attachment: trac_11475v3.patch](https://github.com/sagemath/sage-prod/files/10653116/trac_11475v3.patch.gz).
ohanar commented 13 years ago
comment:16

Replying to @nexttime:

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s

The correct value is 272450167482953 (http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16) but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:17

Replying to @ohanar:

Replying to @nexttime:

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s

The correct value is 272450167482953 (http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16) but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).

Never mind. I thought you had collected a bunch of false results during development. If (apparently) all platforms are affected, it should be easier to track this down - rather than flaws due to e.g. Apple "specifics"... ;-)

Btw, I have a large collection of correct prime counts here, so if eventually the sysloads drop, I'll by myself check some results against your new implementation.

Anyway, anybody feel free to compute more values above our current limit and post them here, whether correct or not. :-)

ohanar commented 13 years ago

Attachment: trac_11475v4.patch.gz

Some major speed improvements using ~4kB of tables

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -33,30 +33,30 @@

sage: timeit('prime_pi(100)') 625 loops, best of 3: 331 ns per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 782 ns per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 971 ns per loop

 Old:

sage: timeit('prime_pi(100)') 625 loops, best of 3: 1.71 µs per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 2.72 µs per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 5.56 µs per loop


-There is also a 45-60% speedup for larger input:
+There is also a ~3x speedup for larger input:

 New:

sage: timeit('prime_pi(108)') -125 loops, best of 3: 2.31 ms per loop +125 loops, best of 3: 759 µs per loop sage: timeit('prime_pi(1010)') -5 loops, best of 3: 108 ms per loop +5 loops, best of 3: 47.4 ms per loop sage: time prime_pi(10**12) 37607912018 -Time: CPU 5.74 s, Wall: 5.75 s +Time: CPU 2.86 s, Wall: 2.86 s

 Old:

@@ -74,11 +74,11 @@

sage: timeit('prime_pi(1010)') -5 loops, best of 3: 54.6 ms per loop +5 loops, best of 3: 27.4 ms per loop sage: timeit('prime_pi(108)') -625 loops, best of 3: 604 µs per loop -sage: timeit('prime_pi(1000)') -625 loops, best of 3: 342 ns per loop +625 loops, best of 3: 325 µs per loop +sage: timeit('prime_pi(10000)') +625 loops, best of 3: 343 ns per loop


 This doesn't cost very much memory, since everything is stored as 32-bit ints, but `prime_range` is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).
@@ -91,4 +91,4 @@

 ---

-Apply only [attachment: trac_11475v3.patch](https://github.com/sagemath/sage-prod/files/10653116/trac_11475v3.patch.gz).
+Apply only [attachment: trac_11475v4.patch](https://github.com/sagemath/sage-prod/files/10653117/trac_11475v4.patch.gz).
ohanar commented 13 years ago
comment:19

An update:

v4: added a table of prime_pi values that can be stored in a byte -- using this with the __cached_primes method gave a giant speedup. Since the sum at the beginning of __phi (and __phi32) differs from 16*(x/77) by a small factor that only depends on x%2310, so adding a table with these factors results in another large speedup. These tables take a total of ~4kB.

v5: the tables introduced a significant initialization time for prime_pi, and William asked me to fix this, so prime_pi (and legendres_formula) only initialize these tables on an initial call -- so initialization time is now ~650ns:

sage: timeit('sage.functions.prime_pi.PrimePi()')
625 loops, best of 3: 627 ns per loop

Both Yann and I have tested 9.75*10^15 again and got the correct result, I also did 10^16. We both ran it on 64-bit GNU/Linux systems. So the issue I was encountering may have been due to boundary issues with the __cached_primes method (which have been rewritten over the course of the patches). I kept the limit in since I haven't yet tested this thoroughly.

ohanar commented 13 years ago

Attachment: trac_11475v5.patch.gz

Reduced initialization time

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -91,4 +91,4 @@

 ---

-Apply only [attachment: trac_11475v4.patch](https://github.com/sagemath/sage-prod/files/10653117/trac_11475v4.patch.gz).
+Apply only [attachment: trac_11475v5.patch](https://github.com/sagemath/sage-prod/files/10653118/trac_11475v5.patch.gz).
83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:21

I've so far only tested a little (on 64-bit Linux; dead old 32-bit box still building Sage), but already found one wrong result with patch v3:

  502000000000000  15296923904889   # 11475v3
  502000000000000  15296922045596   # correct result, (at least) double-checked

Note that 5021012 is below* 249 (562.949.953.421.312).

As my machines are still busy, you could perhaps check if the newer version(s) give the correct result.

ohanar commented 13 years ago
comment:22

Works on sage.math with 11475v5:

sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s
83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:23

Replying to @ohanar:

Works on sage.math with 11475v5:

sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s

Yep, here too; with both v4 and v5 (64-bit Linux).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:24
  580000000000000  17596211452162
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  581000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  582000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  583000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  584000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  585000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  586000000000000               0

Hmm, really?

ohanar commented 13 years ago
comment:25

Replying to @nexttime:

  580000000000000  17596211452162
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  581000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  582000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  583000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  584000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  585000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
  586000000000000               0

Hmm, really?

Oops, thankfully easily fixed. Shouldn't occur in the next patch (whenever that is).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:26

Ok, here are my first 32-bit tests (with patch v5):

sage -t  "devel/sage-11475v5/sage/functions/prime_pi.pyx"   
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
    sage: prime_pi(10**10)
Expected:
    455052511
Got:
    498132213
**********************************************************************
1 items had failures:
   1 of  14 in __main__.example_1
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
         [6.1 s]

----------------------------------------------------------------------
The following tests failed:

        sage -t  "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 6.1 seconds

real    0m6.293s
user    0m5.328s
sys     0m0.760s

With long tests included, some unhandled overflows occur:

sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
    sage: prime_pi(10**10)
Expected:
    455052511
Got:
    498132213
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 231:
    sage: for n in (42,41..32): prime_pi(2**n) # long time (21s on mod.math, 2011)
Expected:
    156661034233
    80316571436
    41203088796
    21151907950
    10866266172
    5586502348
    2874398515
    1480206279
    762939111
    393615806
    203280221
Got:
    2072485352
    555689055
    4249350341
    1916569322
    829105263
    2484680881
    3345496393
    1649125207
    813719441
    403824770
    203621732
**********************************************************************
2 items had failures:
   1 of  14 in __main__.example_1
   1 of  14 in __main__.example_4
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
         [52.1 s]

----------------------------------------------------------------------
The following tests failed:

        sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 52.2 seconds

real    0m52.318s
user    0m49.923s
sys     0m0.956s

This is on Ubuntu 7.10 :) (P4 Northwood, gcc 4.2.1) with Sage 4.7.

ohanar commented 13 years ago
comment:27

Unfortunate, but kind of expected. It'll probably take a day or two for me to set up a 32-bit vm to try to debug the issue (it isn't blatantly obvious to me what is going wrong).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:28

Replying to @ohanar:

(it isn't blatantly obvious to me what is going wrong).

Yep, especially the "long int doesn't fit into int" reminds me of 16-bit systems... 8/

ohanar commented 13 years ago

fixes issue on 32-bit systems

ohanar commented 13 years ago

Description changed:

--- 
+++ 
@@ -91,4 +91,4 @@

 ---

-Apply only [attachment: trac_11475v5.patch](https://github.com/sagemath/sage-prod/files/10653118/trac_11475v5.patch.gz).
+Apply only [attachment: trac_11475v6.patch](https://github.com/sagemath/sage-prod/files/10653119/trac_11475v6.patch.gz).
ohanar commented 13 years ago
comment:29

Attachment: trac_11475v6.patch.gz

ohanar commented 13 years ago
comment:30

Hopefully that was the only issue with 32-bit systems. Basically gmp wants a 32-bit int for mpz_set_ui on 32-bit systems, so I gave it what it wanted (this was also causing the issue with the __sieve function, since I use the mpz_sqrt function to calculate the argument).

I am currently doing some more thorough testing with large values on a 64-bit machine, so that we can be sure there isn't any more breakage on the stable platform.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:31

Replying to @ohanar:

Hopefully that was the only issue with 32-bit systems.

v6 passes long tests on the 32-bit Ubuntu 7.10 machine (in about a minute). More to come.

ohanar commented 13 years ago
comment:32

Apply only trac_11475v6.patch

ohanar commented 13 years ago
comment:33

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :). I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Unfortunately, I have also come across an issue with solaris 10 sparc (t2.math). The root of the cause is that mpz_export has the bit ordering arguments flipped. I am not aware of another way to get unsigned ints out of gmp integers, nor am I aware of a way to check the platform from cython. Also, I don't know if other platforms have this same issue (solaris86 or mac powerpc in particular), so if someone could test that, please let me know.

ohanar commented 13 years ago

Attachment: trac_11475v8.patch.gz

100% doctest coverage + slightly faster init + solaris/sparc fix

ohanar commented 13 years ago
comment:34

Apply only trac_11475v8.patch

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:35

Replying to @ohanar:

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines, I've successfully computed pi(2k) for k=0...53 and pi(10k) for k=0...16 on each of three different machines (two of them 32-bit; all with patch version 6).

Some computations are still in progress, but so far I could also verify the results of prime_pi(2**54) and prime_pi(2**55) (on at least one machine). I've also played a little with "legendres_formula()".

I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...


I wonder if legendres_formula() shouldn't be renamed to legendre_phi() - better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi would be analogous to e.g. euler_phi in Sage. (I know Wolfram and others call "it" - i.e., IMHO the method computing the respective function rather than the function itself - Legendre's Formula as well, but it's not that unambiguous.)

I'd also mention in its docstring it's called the partial sieve function.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:36

Replying to @nexttime:

Replying to @ohanar:

I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...

Ah, I see, you already removed the restriction in patch v8 (and also added warnings w.r.t. seemingly "endless" computations, which I also wanted to suggest).

There's a small typo in __call__()'s docstring (the 2**32 should read 2**64):

        This implementation uses unsigned 64-bit ints and thus does not
        support `x >= 2**32`::

            sage: prime_pi(2^64)
            Traceback (most recent call last):
            ...
            NotImplementedError: computation of prime_pi for x >= 2^64 is not implemented

Also, there's a mixture of both kinds of exponentiation operators (^ and **) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.

ohanar commented 13 years ago
comment:37

Replying to @nexttime:

Replying to @ohanar:

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines,

Which machines have you had issues with? Everything I have tested with v8 has worked (this includes x86 and x86_64 linux, 64bit Snow Leopard, and Sparc Solaris 10).


I wonder if legendres_formula() shouldn't be renamed to legendre_phi() - better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi would be analogous to e.g. euler_phi in Sage.

Makes sense to me, if no one has an objection to the idea, I'll make the change.

I'd also mention in its docstring it's called the partial sieve function.

Sounds good to me.


Replying to @nexttime:

Also, there's a mixture of both kinds of exponentiation operators (^ and **) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.

Will do -- probably ^ since it is more consumer friendly.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:38

Replying to @ohanar:

Replying to @nexttime:

Replying to @ohanar:

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines,

Which machines have you had issues with?

Oh, I didn't get any wrong results; I just meant I've also tested prime_pi(x) with some "random" values (besides the mentioned powers of 2 and ten), but not systematically on all machines (all Linux boxes; Pentium4 Northwood, P4 Prescott and Core2).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:39

Computation of prime_pi(2**56) just finished with the correct result on one of the 32-bit machines :-) (took 2 days+).

[Updated description w.r.t. (now obsolete) limitation / potentially wrong results for inputs exceeding 249 with earlier versions.]

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago

Description changed:

--- 
+++ 
@@ -81,14 +81,14 @@
 625 loops, best of 3: 343 ns per loop

-This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range is currently built on top of pari, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes). +This doesn't cost very much memory, since everything is stored as 32-bit ints, but prime_range() is currently built on top of PARI, which poses a number of problems, but the primary concern is that it is a memory hog when sieving to a large upper bound (important for plotting purposes).

-I know that this implementation starts to fail to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I have been unable to determine. I have limited input to be < 2**49, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms. +An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I had previously been unable to determine, so I had initially limited input to be < 2**49, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.

A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect. (Thanks to Yann for his suggestion)

-Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers not larger than x that are not divisible by -- i.e. co-prime to -- each of the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code. +Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers not larger than x that are not divisible by any -- i.e. co-prime to each -- of the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.


-Apply only attachment: trac_11475v6.patch. +Apply only attachment: trac_11475v8.patch.

ohanar commented 13 years ago

Attachment: trac_11475v9.patch.gz

See comment #40

ohanar commented 13 years ago
comment:40

So through the updates, I haven't been keeping legendres_formula up to date. I am posting a patch that should fix the bugs that were present for the past few versions (primarily, calling it with a relatively small x in comparison to a would give an extremely wrong answer due to overflow issues). I have also changed the name, so it now is called either legendre_phi or partial_sieve_function.

The patch also makes all documentation use ^, there are no longer any instances of **.

The patch also touches up on __phi and __phi32 -- cleaning up code, adding a few comments, and makes a simple optimization to reduce boolean checks for small x values (~10% speedup for large input).

I'm hoping this will be the final patch for this ticket, so that I don't go into the double digits for version numbers :-).

Apply only trac_11475v9.patch