Improved gmpy support - Githubissues

GoogleCodeExporter commented 9 years ago

I'm working on a new gmpy release and I'm willing try and add some 
functions specifically to speed up mpmath. Let me know what specific areas 
could be improved.

casevh

Original issue reported on code.google.com by casevh on 23 Oct 2008 at 6:23

GoogleCodeExporter commented 9 years ago

Original comment by casevh on 23 Oct 2008 at 6:24

Changed state: New

GoogleCodeExporter commented 9 years ago

The most obvious bottleneck is normalize()/normalize1().

If the chunk

        if rnd is round_nearest:
            t = man >> (n-1)
            if t & 1 and ((t & 2) or (man & h_mask[n<300][n])):
                man = (t>>1)+1
            else:
                man = t>>1
        elif shifts_down[rnd][sign]:
            man >>= n
        else:
            man = -((-man)>>n)

could be replaced by a gmpy function

    man = shiftrnd(man, n, rnd),

that handles rnd = 'd', 'u', 'f', 'c', 'n', then that would probably make a 
difference.

Ditto for the trailing zero removing code. Maybe that could be sped up now 
already,
just by using .scan1()?

Original comment by fredrik....@gmail.com on 23 Oct 2008 at 6:47

GoogleCodeExporter commented 9 years ago

It would be useful to add exp_series in gmpy; the speed of exp with gmpy would
become roughly that of gmpy+psyco.
In libelefun the precision at which one switches to exp_series2 should probably
be adjusted. 

In the attached patch for gmpy.c there is a possible implementation

Original comment by mario.pe...@gmail.com on 24 Oct 2008 at 5:07

Attachments:

gmpy_diff

GoogleCodeExporter commented 9 years ago

In the attached file there are the functions exp_series
(as in the previous post), exp_series2, sin_taylor, cos_taylor.
Adding them to gmpy.c and using them in libelefun, and using
cos_taylor and sin_taylor up to prec=1500 in calc_cos_sin,
all runtests pass.

On Athlon 64 X2 Dual Core 3800+
time gmpy with mpmath functions / time gmpy
dps  exp(one/3)  cos(one/3)
30   0.62        0.54
50   0.57        0.48
100  0.52        0.41
200  0.55        0.43
300  0.61        0.49
400  0.61        0.58

Similar speedup should be expected for a few other functions,
in particular the hypergeometric functions 
and the Jacobi theta functions, 
after refactoring a bit the Python code.

Original comment by mario.pe...@gmail.com on 25 Oct 2008 at 4:58

Attachments:

mpmath.c

GoogleCodeExporter commented 9 years ago

oops, the previous version has plenty of memory leaks; here is a revised 
version.

Original comment by mario.pe...@gmail.com on 25 Oct 2008 at 8:07

Attachments:

mpmath1.c

GoogleCodeExporter commented 9 years ago

I don't know if those patches are a good idea:

* The scope of these functions seems to narrow for gmpy
* The speedup is not that big
* I expect that the series used for various functions will change in future 
versions
of mpmath (as they have so far). Such changes should not depend on or obsolete 
the
functions in gmpy

Original comment by fredrik....@gmail.com on 26 Oct 2008 at 6:38

GoogleCodeExporter commented 9 years ago

I'll take a look at normalize and the removal of trailing zeros. I plan to start
developing gmpy 2.x which will include support for Python 3.0. I'd like to 
focus on
adding new features to that release. But if we can get a significant 
improvement for
one or two simple functions, I'll do that.

I'd rather see the algorithms used by mpmath implemented in Python until they 
are
stable. 

Alex created a group "gmpy-dev" where we can discuss the development of gmpy.

Original comment by casevh on 27 Oct 2008 at 2:40

GoogleCodeExporter commented 9 years ago

I agree it is not a good idea to put exp_series2 in gmpy; it gives a small 
speed-up
and it is likely to change in the future, as it has in the past. 
However I still think that exp_series is a good candidate; at dps=100 it is 
roughly
2x faster, so according to the timings page it would be as fast as in 
Mathematica.
The algorithm has not changed at least since mpmath-0.6;there has been only
some refactoring, so that now exp_series depends on r, while before r was 
computed
inside it. It seems to me to be unlikely that this algorithm will change in the 
future.

Original comment by mario.pe...@gmail.com on 28 Oct 2008 at 9:52

GoogleCodeExporter commented 9 years ago

I think there's a chance that the low precision exp could be improved further 
with
some algorithmic tuning. For example, one could test the caching trick I 
implemented
for log and atan.

Original comment by fredrik....@gmail.com on 28 Oct 2008 at 11:34

GoogleCodeExporter commented 9 years ago

Just a an update.

I implemented _normalize in gmpy. My tests show about a 10% improvement versus 
gmpy
1.03. Roughly 4% of the improvement is due to faster conversion between Python 
longs
and mpz. The rest of the improvement comes from _mpmath_normalize.

I've only tested on a 64-bit platform. I need to test on a 32-bit platform 
before I
commit the changes to gmpy. I'm concerned about the number of bits in the 
mantissa
exceeding the C long type on a 32-bit. That would be in excess of 600,000,000 
decimal
digits. Should I worry about that?

It does support very large mantissas.

The entire test suite passed with only one minor change: in test_mpmath.py, I 
needed
to change the list of rounding modes to ['d', 'u', 'f', 'c', 'n']

Original comment by casevh on 16 Nov 2008 at 6:23

GoogleCodeExporter commented 9 years ago

> I implemented _normalize in gmpy. My tests show about a 10% improvement 
versus gmpy
> 1.03. Roughly 4% of the improvement is due to faster conversion between 
Python longs
> and mpz. The rest of the improvement comes from _mpmath_normalize.

Sounds good.

> That would be in excess of 600,000,000 decimal digits. Should I worry about 
that?

I don't think so. If someone needs to work with more than 600M digits on a 
32-bit
platform, they could always switch back to Python mode ;-)

Original comment by fredrik....@gmail.com on 16 Nov 2008 at 11:20

GoogleCodeExporter commented 9 years ago

We're hoping to release gmpy 1.04 in the next couple of days. I've added a 
function
_mpmath_normalize that replaces normalize() and normalize1(). I just committed 
(r780)
the changes needed to support it. With gmpy 1.04, I get about an 8% speedup on
runtests.py

A test Windows installer can be downloaded at:

http://home.comcast.net/~casevh/gmpy-1.04-gmp-4.2.4.win32-py2.6.exe

Versions for Python 2.4 and 2.5 are also there.

GMP was compiled for a Pentium 3 processor. I think that is the same processor 
choice
as the old installers, but I'm not 100% certain. Please let me know if the
performance is radically different.

Original comment by casevh on 25 Nov 2008 at 4:47

GoogleCodeExporter commented 9 years ago

This is strange: I get a negligible speedup (4%), and a 9% slowdown with 
-psyco. With
psyco the slowdown is about 15% for some of the tests, like 
double_compatibility and
basic_integrals.

With -psyco the slowdown disappears if I comment out gmpy._mpmath_normalize, and
without -psyco the speed seems to be about the same whether or not it is used.

Original comment by fredrik....@gmail.com on 25 Nov 2008 at 6:19

GoogleCodeExporter commented 9 years ago

A guess is that psyco's compiled version of the Python code is faster because 
it is
able to globally optimize away the use of boxed Python integers for the 
exponent and
bitcount.

Original comment by fredrik....@gmail.com on 25 Nov 2008 at 6:30

GoogleCodeExporter commented 9 years ago

That's interesting. I normally use 64-bit platform so I can't use psyco. I'll 
need to
add psyco to my virtual machine. 

When I was running my tests, I got about 4% speedup because of the faster 
long2mpz
conversion. _mpmath_normalize improved performance by 4% (roughly) for me.

I'll try to run some 32-bit Windows tests with psyco.

I'm assuming you are using Windows. What processor do you have? I can build you 
a
version optimized for your CPU.

Original comment by casevh on 25 Nov 2008 at 8:19

GoogleCodeExporter commented 9 years ago

Celeron M 520, so I'm guessing Core 2.

In fact, I've compared the Core 2 and standard build of gmpy 1.03 for Python 
2.6.
Multiplying two million-digit integers is about 40% faster. However, the 
difference
when running the mpmath tests is hardly measurable.

Original comment by fredrik....@gmail.com on 25 Nov 2008 at 9:03

GoogleCodeExporter commented 9 years ago

psyco.full() is not generally faster. I think it only optimizes Python 
integers, so
it should be slower when used with gmp integers.

Original comment by Vinzent.Steinberg@gmail.com on 25 Nov 2008 at 1:11

GoogleCodeExporter commented 9 years ago

> psyco.full() is not generally faster. I think it only optimizes Python 
integers, so
> it should be slower when used with gmp integers.

It is not generally faster, but it is consistently about 2x faster for mpmath,
whether in gmpy mode or not. In any case, the integers that aren't being 
optimized
with the new patch (sign, exp, bc) are Python integers, not gmpy integers. 
(They are
Python integers because they are typically small, and Python integers are 
faster .)

Original comment by fredrik....@gmail.com on 25 Nov 2008 at 4:47

GoogleCodeExporter commented 9 years ago

With your latest gmpy build, mpmath is leaking memory.

Try for example this:

from mpmath import *
b=lambda s:2*exp(s*j*pi/2)*quad(lambda t:s*t**s/(1-exp(2*pi*t))/t,[0,inf])
mp.dps = 6
for x in linspace(1, 2, 1000):
    q = b(x)

Original comment by fredrik....@gmail.com on 26 Nov 2008 at 11:55

GoogleCodeExporter commented 9 years ago

Thanks for the report. I'll try to look at it in the few days.

Original comment by casevh on 26 Nov 2008 at 1:54

GoogleCodeExporter commented 9 years ago

Is this with gmpy 1.04 in general, or just if _mpmath_normalize is used?

Original comment by casevh on 26 Nov 2008 at 4:45

GoogleCodeExporter commented 9 years ago

Just with _mpmath_normalize.

Original comment by fredrik....@gmail.com on 26 Nov 2008 at 6:51

GoogleCodeExporter commented 9 years ago

I fixed (I hope) the memory leak in _mpmath_normalize. I've updated the test 
installer.

It's not too surprising there is no performance improvement with psyco. Since 
psyco
removes the overhead the byte-code interpreter, the only advantage 
_mpmath_normalize
has is in trying to minimize the data manipulations. It's probably possible to
improve the performance slightly by directly manipulating the internals of the 
GMP
data types but I doubt if that is worth it.

If the new installer works for you, we should release within the next few days.

casevh

Original comment by casevh on 11 Dec 2008 at 7:41

GoogleCodeExporter commented 9 years ago

> It's not too surprising there is no performance improvement with psyco.

It's not surprising that there's no improvement, but it's surprising that it 
gets
*slower* (and quite a bit slower).

Original comment by fredrik....@gmail.com on 11 Dec 2008 at 1:55

GoogleCodeExporter commented 9 years ago

With the new version I get timings:

39.55 1.03+python
35.69 1.04+python
26.76 1.03+psyco
26.65 1.04+psyco

So it's now actually slightly faster with psyco too. Did you make any other 
changes
besides fixing the memory leak? (Not that it's surprising that a memory leak by
itself might cause slowness.)

double_compatibility is actually still slower with 1.04 than with 1.03 but the
difference now is just about 2% (confirmed over multiple runs).

Original comment by fredrik....@gmail.com on 11 Dec 2008 at 3:03

GoogleCodeExporter commented 9 years ago

Do you have the time to also implement from_man_exp? It does essentially 
nothing but
extract the sign and count bits of the mantissa and then call normalize(), so I 
think
it should be really simple with normalize() already implemented.

Running the tests with -profile, with a total time of 114.074 s, from_man_exp
represents 4.874 s tottime and 10.763 s cumtime.

Another cheap possibility that comes to mind is to provide a mpmath-compatible
bitcount() to avoid the Python wrapper function. Currently gmpy_bitcount 
represents
5.394 s tottime and 8.927 s cumtime. (However, -profile probably exaggerates the
overhead, so the real benefit might be much smaller.)

Original comment by fredrik....@gmail.com on 11 Dec 2008 at 3:25

GoogleCodeExporter commented 9 years ago

Sorry that it's taken me so long to respond.

I implemented from_man_exp but there was no significant speed-up. I did commit 
Mario 
Pernici's patch for bit_length and it did offer a measureable speed-up. 
Comparing 
gmpy 1.03 to gmpy 1.04 with bit_length, the running time for runtests.py 
improved 
from 25.8 seconds to 21.5 seconds. Using psyco and gmpy 1.04, the running time 
was 
18.5 seconds.

At this point, I'd just like to drop the normalize and from_man_exp functions 
until 
I can figure out a way to make them faster. Then I can get gmpy 1.04 released.

Case

Original comment by casevh on 4 Jan 2009 at 9:17

GoogleCodeExporter commented 9 years ago

I found a few issues with _mpmath_create. 64-bit performance on Linux improved 
from 
22.6 seconds to 19.3 seconds. Under Windows, I believe it is still a little 
faster 
but it is hard for me to get accurate timing results on a virtual machine.

I added a patch to libmpf to support the new features.

Could you run try running some tests again? 

I've uploaded new versions of gmpy 1.03 and 1.04 compiled with identical 
options and 
the same version of GMP. They are located at

http://home.comcast.net/~casevh/gmpy-1.03.win32-py2.6.exe
http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe

Thanks,

Case

Original comment by casevh on 5 Jan 2009 at 8:31

GoogleCodeExporter commented 9 years ago

Hi

You might be interested to know that the gmpy v1.04 code fails to compile 
correctly
with MS Visual C/C++ because mpq_out_str is ok with the native toolset and the
dllexport declaration has been omitted from the initgmpy function, which should 
be
declared as:

__declspec( dllexport ) void initgmpy(void)

(see v1.03)

    best regards,

       Brian Gladman

Original comment by rieman...@gmail.com on 5 Jan 2009 at 3:06

GoogleCodeExporter commented 9 years ago

1.03 - 35.19 s
1.04 - 27.69 s

Original comment by fredrik....@gmail.com on 6 Jan 2009 at 3:31

GoogleCodeExporter commented 9 years ago

Ondrej, can you set up gmpy 1.04 on sage? It would be interesting to see 
results of
the MPFR benchmark suite with the latest revision of mpmath plus gmpy 1.04.

Original comment by fredrik....@gmail.com on 6 Jan 2009 at 4:06

GoogleCodeExporter commented 9 years ago

Not soon, as they are having problems with sage.math right now.

Original comment by ondrej.c...@gmail.com on 6 Jan 2009 at 5:17

GoogleCodeExporter commented 9 years ago

I've committed Mario's performance improvements to _mpmath_normalize and
_mpmath_create. I won't have time to create new binaries until tomorrow.

runtests results for gmpy 1.03 vs. 1.04, Linux x64
1.03 - 22.6 s
1.04 - 18.6 s

Case

Original comment by casevh on 8 Jan 2009 at 7:09

GoogleCodeExporter commented 9 years ago

Brian,

Thanks for the report. I'm still trying to setup a VS tool chains so I can 
compile
natively under Windows. I added DL_EXPORT(void) which should be a 
cross-platform fix.
I haven't had a chance to test it yet.

I'm not clear about the error you are getting with mpq_out_str.

Case

Original comment by casevh on 8 Jan 2009 at 7:14

GoogleCodeExporter commented 9 years ago

Hi Case

I would be happy to provide my VS build files for GMPY if this would help.  I 
am not
sure how to post code but I am sure I can find out!

The mpq_out_str issue arises because of a 'fix' that mentions this function in 
the
1.04 version of gmpy.c - these lines are not needed since GMP built with Visual
Studio doesn't appear to have this problem:

/* fix anomalies of .h vs .lib on Windows; also, use alloca */
#undef mpq_out_str
size_t mpq_out_str _PROTO ((FILE *, int, mpq_srcptr));

and these lines cause the compile to fail.

    Brian

Original comment by rieman...@gmail.com on 8 Jan 2009 at 7:59

GoogleCodeExporter commented 9 years ago

I haave attached a ZIP file containing my Visual Studio build files for GMPY 
and a
revised copy of gmpy.c with the changes I needed to get the latest version to
compile.  In addition to the mpq_out_str issue, the MS compiler doesn't provide 
isnan
and isinf and it doesn't support C99 mixing of declarations and code.

Let me know if I can halp with any of this.

   best regards,

      Brian

PS I would have posted this in GMPY but for some reason I was not offered the 
option
of commenting there.

Original comment by rieman...@gmail.com on 8 Jan 2009 at 9:52

Attachments:

gmpy.vc9.zip

GoogleCodeExporter commented 9 years ago

Thanks! I will merge the changes in the next couple of days.

Case

Original comment by casevh on 9 Jan 2009 at 6:06

GoogleCodeExporter commented 9 years ago

With the faster bitcount, some code can be optimized further for gmpy mode.

Most importantly, mpf_mul becomes 40% faster by just calling bitcount(man) 
instead of
calculating it from sbc and tbc. Other places are mpf_mul_int, mpf_add and 
mpf_pow_int.

Original comment by fredrik....@gmail.com on 12 Jan 2009 at 1:51

GoogleCodeExporter commented 9 years ago

The attached patch to libmpf.py rev. 832  uses bitcount in mpf_add, etc.
as indicated by Fredrik (apart from mpf_pow_int).
On my Athlon XP 2600 runtests is 3% faster, add, mul, mul_int are
respectively 15%, 22%, 30% faster.

Adding in gmpy mpz_bitcount, which is an optimized version of bit_length
which accepts only mpz, and optimizing input processing in mpmath_normalize
runtests gets another 5% speedup, add, mul, mul_int, div 20%, 6%, 34%, 9%
respectively. I am submitting a patch for this in gmpy.

Original comment by mario.pe...@gmail.com on 13 Jan 2009 at 12:08

Attachments:

libmpf_diff

GoogleCodeExporter commented 9 years ago

I was working on the same thing; I have committed my changes. Feel free to 
commit the
bitcount1 optimization. I think we should not have two versions of mpf_add, 
though;
it's just too much code to duplicate.

As for mpf_pow_int, it could be optimized at low precision by taking several 
bits
from the exponent at every step. For example with 53-bit precision and taking 4 
bits
at a time, one would compute (man**r)>>shift where each r < 16 so man**r < 
2^850;
this is probably faster than four full steps through the loop.

Original comment by fredrik....@gmail.com on 13 Jan 2009 at 12:44

GoogleCodeExporter commented 9 years ago

I've committed Mario's patches to gmpy. I did change the name from mpz_bitcount 
to 
_mpmath_bitcount. I didn't want to cause confusion with bit_length.

I've put new Windows binaries for gmpy 1.04 at 

http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe

There are versions for Python 2.3,4,5,6. These were compiled using MPIR instead 
of 
GMP. I also used --enable-fat so these binaries should automatically detect 
which 
processor is in use and select code optimized for that processor.

On my 64-bit Linux box, I have the following results for runtests.py:

-nogmpy:   ~40.5s
gmpy 1.03: ~21.1s
gmpy 1.04: ~15.8s

The same version of MPIR was used in each case.

Original comment by casevh on 14 Jan 2009 at 7:26

GoogleCodeExporter commented 9 years ago

Windows 32bit results:

python 2.5 (gmpy 1.03)
python: 59.46
python+psyco: 47.09
gmpy: 33.64
gmpy+psyco: 24.65

python 2.6 (gmpy 1.04)
python: 58.03
python+psyco: 47.86
gmpy: 25.39
gmpy+psyco 19.65

Original comment by fredrik....@gmail.com on 14 Jan 2009 at 8:59

GoogleCodeExporter commented 9 years ago

I submitted in gmpy a patch with _mpmath_mpf_mul, with which mpf_mul is 2x 
faster
at dps=100.

Original comment by mario.pe...@gmail.com on 19 Jan 2009 at 4:38

GoogleCodeExporter commented 9 years ago

There is a bug in _mpmath_bitcount: it gives bitcount(0) = 1.

The bug is triggered when running the tests with -strict.

Why do we need both bitcount1 and bitcount? bitcount1 checks the type of its 
argument
anyway, so why can't bit_length be as fast?

Original comment by fredrik....@gmail.com on 23 Jan 2009 at 3:11

GoogleCodeExporter commented 9 years ago

I'd like to remove bitcount1, just use bitcount = bit_length, and any further
optimization to bit_length can be done on the gmpy end.

Original comment by fredrik....@gmail.com on 23 Jan 2009 at 3:38

GoogleCodeExporter commented 9 years ago

Original comment by ondrej.c...@gmail.com on 23 Jan 2009 at 4:31

GoogleCodeExporter commented 9 years ago

I submitted in gmpy a fix for this bug in _mpmath_bitcount . 
bitcount1 is 20% faster than bitcount for small numbers, and bitcounting is a
bottleneck, so I would prefer it remained.
I doubt that bit_length can be optimized much; it is
both a function and a method, which is some overhead; it must call anynum2mpz 
since
it is not guaranteed that the number is an mpz.

On a related issue I wonder if introducing bitcount in GMP or MPIR would be 
significantly faster than using mpz_sizeinbase, in which there is some overhead 
to
support a generic base.

Original comment by mario.pe...@gmail.com on 23 Jan 2009 at 8:04

GoogleCodeExporter commented 9 years ago

Can't bit_length check quickly for mpz type first, and only if that fails call
anynum2mpz?

Original comment by fredrik....@gmail.com on 23 Jan 2009 at 8:46

GoogleCodeExporter commented 9 years ago

Fredrik, you are right, bit_length can be optimized; in the attached file there
is a version of bit_length with which bitcount in 4% slower than bitcount1.
I will see if it can be made faster, but even if not, I agree that 
_mpmath_bitcount 
and bitcount1 can be removed.

Original comment by mario.pe...@gmail.com on 25 Jan 2009 at 9:49

Attachments:

bit_length.c

GoogleCodeExporter commented 9 years ago

I've built what I hope are the final installers for gmpy 1.04. They are 
temporarily 
availabe at http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe.

The were built using MPIR instead of GMP and support code for multiple CPUs.

Please test. If are no problems, gmpy 1.04 should be released very soon.

Case

Original comment by casevh on 30 Jan 2009 at 7:31

upiterbarg / mpmath

Improved gmpy support #94