upiterbarg / mpmath

Automatically exported from code.google.com/p/mpmath
Other
0 stars 0 forks source link

Improved gmpy support #94

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm working on a new gmpy release and I'm willing try and add some 
functions specifically to speed up mpmath. Let me know what specific areas 
could be improved.

casevh

Original issue reported on code.google.com by casevh on 23 Oct 2008 at 6:23

GoogleCodeExporter commented 9 years ago
Tested without problems.

Original comment by fredrik....@gmail.com on 30 Jan 2009 at 9:44

GoogleCodeExporter commented 9 years ago
In the attached files there are a patch for gmpy rev 79 containing
bindings for mpf_add, mpf_mul, exp_series, cos_taylor, sin_taylor,
and a patch for mpmath to use them.
While probably support for mpf_add, mpf_mul will be added in gmpy,
in a form or another, it is not clear if the other functions will
be considered; anyway it seems to me to be interesting to see how fast
mpmath can now be, with a little tweaking.

Here there are benchmarks on hp pavilion Q8200 2.33GHz
at dps=100 using the script timings-mpfr.c for mpfr, timings_mpmath.py
for mpmath rev 869 with gmpy rev 79 and for the patched version

           mpfr       mpmath_r869 patch
mul        0.000170   0.00161     0.000657
div        0.000428   0.00347     0.00345
sqrt       0.000671   0.00526     0.00501
exp        0.00919    0.0306      0.0127
log        0.0152     0.0241      0.0241
sin        0.0110     0.0505      0.0201
cos        0.00775    0.0513      0.0204
acos       0.0713     0.0486      0.0437
atan       0.0613     0.0295      0.0294

Using a C version of log_taylor does not change anything.
The previous version of mpf_log was faster,
and with a C version of log_taylor it was faster than mpfr.

Original comment by mario.pe...@gmail.com on 31 Jan 2009 at 10:33

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you. How big is the speedup for mpf_add?

I think a generalized version of the function that is presently called 
exp_series2
would be a good idea. With simple modifications, this code could be made to 
compute
either exp, cosh/sinh simultaneously or cos/sin simultaneously. I am going to
implement something like this in Python to see how it turns out.

> Using a C version of log_taylor does not change anything.
> The previous version of mpf_log was faster,

After r868? Why is it slower now?

Original comment by fredrik....@gmail.com on 31 Jan 2009 at 11:06

GoogleCodeExporter commented 9 years ago
Why not to just use mpfr then? :)

Original comment by ondrej.c...@gmail.com on 31 Jan 2009 at 11:14

GoogleCodeExporter commented 9 years ago
I translated in C the wrong series for log; now log is faster than
in mpfr; in the attached patch to gmpy rev 79 I added binding for
log_series, which is called by log_taylor_cached, see the attached
patch to mpmath.

Added to timings-mpfr.c and timings_mpmath.py an entry for add
           mpfr       mpmath_r869 patch
add        0.000077   0.00219     0.000538
mul        0.000170   0.00161     0.000657
div        0.000428   0.00347     0.00345
sqrt       0.000671   0.00526     0.00501
exp        0.00919    0.0306      0.0127
log        0.0152     0.0241      0.0124
sin        0.0110     0.0505      0.0201
cos        0.00775    0.0513      0.0204
acos       0.0713     0.0486      0.0437
atan       0.0613     0.0295      0.0294

Original comment by mario.pe...@gmail.com on 1 Feb 2009 at 11:46

Attachments:

GoogleCodeExporter commented 9 years ago
That looks good. Since I intend to make further changes to the Python
implementations, I think it's too early to commit any of that code right now, 
however.

I'm attaching a generalized version of exp_series2. With this function one gets 
exp,
cosh, sinh, cos, sin in one package. I think it is very much desirable to have 
just
one function, as all the series are virtually identical and this way one only 
needs
to determine optimal parameters (and debug) once.

If both r and p are chosen optimally, exponential_series is about the same 
speed as
the existing exp code at all precision levels. It is slightly slower than the
existing exp series at low precision, probably due to not using a specialized 
exp
series, and also slower than AGM asymptotically (though actually not much 
slower). It
could conceivably be faster than the existing exp code somewhere in between (I 
have
not checked extensively).

At 300,000 bits, with 0.5 < x < 1, r=130, p=4, exponential_series takes 3.0 
seconds,
compared to 2.0 seconds for the AGM-based exp and 14.1 seconds for the present 
cos
code (a 4.6x speedup!).

A similar function can be written for log and atan. For those, one can likewise
perform r argument reductions (using square roots), and use Smith's method with
degree 2^p.

Original comment by fredrik....@gmail.com on 1 Feb 2009 at 8:18

Attachments:

GoogleCodeExporter commented 9 years ago
> Why not to just use mpfr then? :)

Why, when we evidently can achieve nearly the same speed as mpfr (and even be 
faster
in some cases) without that much extra code?

(Why write sympy and not just use pyginac? :)

Original comment by fredrik....@gmail.com on 1 Feb 2009 at 8:24

GoogleCodeExporter commented 9 years ago
> (Why write sympy and not just use pyginac? :)

That's what Sage is doing, only they use Cython instead of swig (for obvious 
reasons).

Original comment by ondrej.c...@gmail.com on 1 Feb 2009 at 9:36

GoogleCodeExporter commented 9 years ago
Added bindings to mpf_div and mpf_sqrt in the patch to gmpy

Added timings.sage to the benchmarks.

           mpfr       mpmath_r869 patch       sage
add        0.000077   0.00219     0.000538    0.000424
mul        0.000170   0.00161     0.000657    0.000516
div        0.000428   0.00347     0.00111     0.000804
sqrt       0.000671   0.00526     0.00156     0.00134
exp        0.00919    0.0306      0.0127      0.0119
log        0.0152     0.0241      0.0124      0.0170
sin        0.0110     0.0505      0.0201      0.0131
cos        0.00775    0.0513      0.0204      0.00912
acos       0.0713     0.0486      0.0437      0.0768
atan       0.0613     0.0295      0.0294      0.0676

With this patch runtests is 20% faster.

Original comment by mario.pe...@gmail.com on 3 Feb 2009 at 11:18

Attachments:

GoogleCodeExporter commented 9 years ago
Great! The mpfr add and mul are of course much faster even than sage because 
they
spend very little time on object/memory management. Now maybe with a little 
further
work the sage people could be convinced to support mpmath too (and by that I 
mean not
just indirectly via sympy).

By the way, since you are able to compare with mpfr, how do exp and log fare at
extremely high (10K, 100K, 1M digits) precision?

I still think handling of special values (except perhaps x*0 and x+0 -- come to 
think
of it, those cases need to be fast in some situations) should fall back to 
Python
somehow.

At this point some functions that do inline arithmetic would do better to fall 
back
to using mpf_mul et al (I'm thinking particularly of mpc_mul which is 
unnecessarily
slow in gmpy mode).

Original comment by fredrik....@gmail.com on 3 Feb 2009 at 7:41

GoogleCodeExporter commented 9 years ago
See issue 108 and issue 127 for possible changes in behavior that need to be 
resolved.

Original comment by fredrik....@gmail.com on 3 Feb 2009 at 8:04

GoogleCodeExporter commented 9 years ago
> By the way, since you are able to compare with mpfr, how do exp and log fare 
at
> extremely high (10K, 100K, 1M digits) precision?

I report the times for the first run and the second run in parenthesis, when
faster.
It seems that in rev 869 there is more caching than in rev 700, so the first 
run is
slower,
the second run faster. Which revision is better?
If I did not make some stupid mistake in benchmarking, it seems that sage is 
faster
than mpfr at very high precision.

Benchmark on hp Pavilion Q8200 2.33GHz
a = sqrt(5)
exp(a)
dps        mpfr     sage     rev 869     rev 700 
10000      0.020    0.017    0.072       0.047  
10000      -----    ----     (0.027)     (0.032)
100000     1.00     0.681    2.11         1.52 
100000     ----     ----     (1.12)      (1.26)
1000000    22.7     14.7     45.7         35.6
1000000    ----     ----     (27.7)      (30.8)

log(a)
10000      0.040    0.029     0.060      0.038
10000      (0.029)  (0.012)   (0.023)    (0.030)
100000     1.78     1.22      1.73       1.36
100000     (0.85)   (0.57)    (0.96)     (1.16)
1000000    44.7     28.9      39.1       31.8
1000000    (21.4)   (14.0)    23.3       (27.4)

Original comment by mario.pe...@gmail.com on 3 Feb 2009 at 11:43

GoogleCodeExporter commented 9 years ago
In the previous post, read rev 860 instead of rev 700.

sage does use mpfr_exp, so I do not understand why I get slower results using 
mpfr.

mpmath has worse results for exp than for log; maybe an improved exp_series2 
would be faster than the Newton method.

Original comment by mario.pe...@gmail.com on 4 Feb 2009 at 6:10

GoogleCodeExporter commented 9 years ago
The only caching is of ln(2) and pi. The question is whether enough function
evaluations are performed by the user to make the caching pay off. This happens
already after 3 or 4 evaluations, so possibly.

By the way, in exp_newton, ln2 and pi are computed multiple times. At 
dps=1000000,
about 8% time could be saved by precomputing them at the final precision.

Original comment by fredrik....@gmail.com on 4 Feb 2009 at 6:30

GoogleCodeExporter commented 9 years ago
MPFR apparently uses binary splitting at high precision. They use a generic
implementation that works for arbitrary hypergeometric functions. This is 
described
in the paper "Évaluation rapide de fonctions hypergéométriques",
ftp://ftp.inria.fr/INRIA/publication/publi-pdf/RT/RT-0242.pdf

Original comment by fredrik....@gmail.com on 4 Feb 2009 at 12:54

GoogleCodeExporter commented 9 years ago
I added some more bindings to the above patch to gmpy, in particular
from_float; with this patch runtests in 25% faster.
I think that runtests cannot be made sensibly faster with this approach.
There are several bindings which could be added, but they weight little
in runtests.
Now the function which takes more time in runtests is __mul__;
__add__ is also pretty slow.

I introduced a new float type mpg in another patch to gmpy
(the name mpf is already taken in it); with s, t mpg objects one can compute
mpg_mpf_mul(s, t, prec, rnd), 
which is slightly faster than mpf_mul, or
s * t, 
which is roughly as fast as in sage; same for addition.
It seems that the low level mpmath api could become slower than the 
high level one!
In this way _mul__, __add__  and other special methods should become fast.
The float type is defined
typedef struct {
    mpob ob;
    long sign;
    PympzObject *m;
    PyObject *exp;
    long bc;
} PympgObject;
The code for mul and add is almost the same as in the previous patch.

The main thing missing to make this usable in mpmath is subclassing mpg from
mpnumeric and introducing mpc.

Original comment by mario.pe...@gmail.com on 14 Feb 2009 at 9:59

GoogleCodeExporter commented 9 years ago
The problem is here is that maintenance overhead increases very significantly 
by also
implementing these parts in gmpy. At this point I don't think the high level 
mpmath
api is stable enough to rule out the need for changes that would break 
compatibility
with a C implementation of the mpf class.

> It seems that the low level mpmath api could become slower than the 
> high level one!

This shouldn't be a problem; the low level api is not particularly concerned 
with mpf
values being tuples and could use mpf instances directly if they were faster 
(and
then e.g. x._mpf_ could be implemented to return x itself). The changes 
required to
support both tuples and some other mpf type implementation would be minor.

Original comment by fredrik....@gmail.com on 14 Feb 2009 at 6:08

GoogleCodeExporter commented 9 years ago
I have posted in the gmpy issues an implementation of part of gmpy in Cython,
with which runtests is as fast as with gmpy. One can develop independently the
Cython wrapper of GMPY and the Cython code for functions of mpmath.

Original comment by mario.pe...@gmail.com on 4 May 2009 at 11:35

GoogleCodeExporter commented 9 years ago
For reference: http://code.google.com/p/gmpy/issues/detail?id=16#c2

Original comment by Vinzent.Steinberg@gmail.com on 5 May 2009 at 2:09

GoogleCodeExporter commented 9 years ago
Here is a benchmark with timings_mpmath.py using gmpy in Cython
with dps=100
     mpmath     sage
add  0.00046    0.00043
mul  0.00077    0.00052
div  0.0012     0.00080
sqrt 0.0018     0.0014
exp  0.011      0.012
log  0.012      0.017
sin  0.011      0.013
cos  0.010      0.0091
acos 0.024      0.075
atan 0.013      0.066

Original comment by mario.pe...@gmail.com on 11 May 2009 at 9:40