Open GoogleCodeExporter opened 9 years ago
Original comment by casevh
on 23 Oct 2008 at 6:24
The most obvious bottleneck is normalize()/normalize1().
If the chunk
if rnd is round_nearest:
t = man >> (n-1)
if t & 1 and ((t & 2) or (man & h_mask[n<300][n])):
man = (t>>1)+1
else:
man = t>>1
elif shifts_down[rnd][sign]:
man >>= n
else:
man = -((-man)>>n)
could be replaced by a gmpy function
man = shiftrnd(man, n, rnd),
that handles rnd = 'd', 'u', 'f', 'c', 'n', then that would probably make a
difference.
Ditto for the trailing zero removing code. Maybe that could be sped up now
already,
just by using .scan1()?
Original comment by fredrik....@gmail.com
on 23 Oct 2008 at 6:47
It would be useful to add exp_series in gmpy; the speed of exp with gmpy would
become roughly that of gmpy+psyco.
In libelefun the precision at which one switches to exp_series2 should probably
be adjusted.
In the attached patch for gmpy.c there is a possible implementation
Original comment by mario.pe...@gmail.com
on 24 Oct 2008 at 5:07
Attachments:
In the attached file there are the functions exp_series
(as in the previous post), exp_series2, sin_taylor, cos_taylor.
Adding them to gmpy.c and using them in libelefun, and using
cos_taylor and sin_taylor up to prec=1500 in calc_cos_sin,
all runtests pass.
On Athlon 64 X2 Dual Core 3800+
time gmpy with mpmath functions / time gmpy
dps exp(one/3) cos(one/3)
30 0.62 0.54
50 0.57 0.48
100 0.52 0.41
200 0.55 0.43
300 0.61 0.49
400 0.61 0.58
Similar speedup should be expected for a few other functions,
in particular the hypergeometric functions
and the Jacobi theta functions,
after refactoring a bit the Python code.
Original comment by mario.pe...@gmail.com
on 25 Oct 2008 at 4:58
Attachments:
oops, the previous version has plenty of memory leaks; here is a revised
version.
Original comment by mario.pe...@gmail.com
on 25 Oct 2008 at 8:07
Attachments:
I don't know if those patches are a good idea:
* The scope of these functions seems to narrow for gmpy
* The speedup is not that big
* I expect that the series used for various functions will change in future
versions
of mpmath (as they have so far). Such changes should not depend on or obsolete
the
functions in gmpy
Original comment by fredrik....@gmail.com
on 26 Oct 2008 at 6:38
I'll take a look at normalize and the removal of trailing zeros. I plan to start
developing gmpy 2.x which will include support for Python 3.0. I'd like to
focus on
adding new features to that release. But if we can get a significant
improvement for
one or two simple functions, I'll do that.
I'd rather see the algorithms used by mpmath implemented in Python until they
are
stable.
Alex created a group "gmpy-dev" where we can discuss the development of gmpy.
Original comment by casevh
on 27 Oct 2008 at 2:40
I agree it is not a good idea to put exp_series2 in gmpy; it gives a small
speed-up
and it is likely to change in the future, as it has in the past.
However I still think that exp_series is a good candidate; at dps=100 it is
roughly
2x faster, so according to the timings page it would be as fast as in
Mathematica.
The algorithm has not changed at least since mpmath-0.6;there has been only
some refactoring, so that now exp_series depends on r, while before r was
computed
inside it. It seems to me to be unlikely that this algorithm will change in the
future.
Original comment by mario.pe...@gmail.com
on 28 Oct 2008 at 9:52
I think there's a chance that the low precision exp could be improved further
with
some algorithmic tuning. For example, one could test the caching trick I
implemented
for log and atan.
Original comment by fredrik....@gmail.com
on 28 Oct 2008 at 11:34
Just a an update.
I implemented _normalize in gmpy. My tests show about a 10% improvement versus
gmpy
1.03. Roughly 4% of the improvement is due to faster conversion between Python
longs
and mpz. The rest of the improvement comes from _mpmath_normalize.
I've only tested on a 64-bit platform. I need to test on a 32-bit platform
before I
commit the changes to gmpy. I'm concerned about the number of bits in the
mantissa
exceeding the C long type on a 32-bit. That would be in excess of 600,000,000
decimal
digits. Should I worry about that?
It does support very large mantissas.
The entire test suite passed with only one minor change: in test_mpmath.py, I
needed
to change the list of rounding modes to ['d', 'u', 'f', 'c', 'n']
Original comment by casevh
on 16 Nov 2008 at 6:23
> I implemented _normalize in gmpy. My tests show about a 10% improvement
versus gmpy
> 1.03. Roughly 4% of the improvement is due to faster conversion between
Python longs
> and mpz. The rest of the improvement comes from _mpmath_normalize.
Sounds good.
> That would be in excess of 600,000,000 decimal digits. Should I worry about
that?
I don't think so. If someone needs to work with more than 600M digits on a
32-bit
platform, they could always switch back to Python mode ;-)
Original comment by fredrik....@gmail.com
on 16 Nov 2008 at 11:20
We're hoping to release gmpy 1.04 in the next couple of days. I've added a
function
_mpmath_normalize that replaces normalize() and normalize1(). I just committed
(r780)
the changes needed to support it. With gmpy 1.04, I get about an 8% speedup on
runtests.py
A test Windows installer can be downloaded at:
http://home.comcast.net/~casevh/gmpy-1.04-gmp-4.2.4.win32-py2.6.exe
Versions for Python 2.4 and 2.5 are also there.
GMP was compiled for a Pentium 3 processor. I think that is the same processor
choice
as the old installers, but I'm not 100% certain. Please let me know if the
performance is radically different.
Original comment by casevh
on 25 Nov 2008 at 4:47
This is strange: I get a negligible speedup (4%), and a 9% slowdown with
-psyco. With
psyco the slowdown is about 15% for some of the tests, like
double_compatibility and
basic_integrals.
With -psyco the slowdown disappears if I comment out gmpy._mpmath_normalize, and
without -psyco the speed seems to be about the same whether or not it is used.
Original comment by fredrik....@gmail.com
on 25 Nov 2008 at 6:19
A guess is that psyco's compiled version of the Python code is faster because
it is
able to globally optimize away the use of boxed Python integers for the
exponent and
bitcount.
Original comment by fredrik....@gmail.com
on 25 Nov 2008 at 6:30
That's interesting. I normally use 64-bit platform so I can't use psyco. I'll
need to
add psyco to my virtual machine.
When I was running my tests, I got about 4% speedup because of the faster
long2mpz
conversion. _mpmath_normalize improved performance by 4% (roughly) for me.
I'll try to run some 32-bit Windows tests with psyco.
I'm assuming you are using Windows. What processor do you have? I can build you
a
version optimized for your CPU.
Original comment by casevh
on 25 Nov 2008 at 8:19
Celeron M 520, so I'm guessing Core 2.
In fact, I've compared the Core 2 and standard build of gmpy 1.03 for Python
2.6.
Multiplying two million-digit integers is about 40% faster. However, the
difference
when running the mpmath tests is hardly measurable.
Original comment by fredrik....@gmail.com
on 25 Nov 2008 at 9:03
psyco.full() is not generally faster. I think it only optimizes Python
integers, so
it should be slower when used with gmp integers.
Original comment by Vinzent.Steinberg@gmail.com
on 25 Nov 2008 at 1:11
> psyco.full() is not generally faster. I think it only optimizes Python
integers, so
> it should be slower when used with gmp integers.
It is not generally faster, but it is consistently about 2x faster for mpmath,
whether in gmpy mode or not. In any case, the integers that aren't being
optimized
with the new patch (sign, exp, bc) are Python integers, not gmpy integers.
(They are
Python integers because they are typically small, and Python integers are
faster .)
Original comment by fredrik....@gmail.com
on 25 Nov 2008 at 4:47
With your latest gmpy build, mpmath is leaking memory.
Try for example this:
from mpmath import *
b=lambda s:2*exp(s*j*pi/2)*quad(lambda t:s*t**s/(1-exp(2*pi*t))/t,[0,inf])
mp.dps = 6
for x in linspace(1, 2, 1000):
q = b(x)
Original comment by fredrik....@gmail.com
on 26 Nov 2008 at 11:55
Thanks for the report. I'll try to look at it in the few days.
Original comment by casevh
on 26 Nov 2008 at 1:54
Is this with gmpy 1.04 in general, or just if _mpmath_normalize is used?
Original comment by casevh
on 26 Nov 2008 at 4:45
Just with _mpmath_normalize.
Original comment by fredrik....@gmail.com
on 26 Nov 2008 at 6:51
I fixed (I hope) the memory leak in _mpmath_normalize. I've updated the test
installer.
It's not too surprising there is no performance improvement with psyco. Since
psyco
removes the overhead the byte-code interpreter, the only advantage
_mpmath_normalize
has is in trying to minimize the data manipulations. It's probably possible to
improve the performance slightly by directly manipulating the internals of the
GMP
data types but I doubt if that is worth it.
If the new installer works for you, we should release within the next few days.
casevh
Original comment by casevh
on 11 Dec 2008 at 7:41
> It's not too surprising there is no performance improvement with psyco.
It's not surprising that there's no improvement, but it's surprising that it
gets
*slower* (and quite a bit slower).
Original comment by fredrik....@gmail.com
on 11 Dec 2008 at 1:55
With the new version I get timings:
39.55 1.03+python
35.69 1.04+python
26.76 1.03+psyco
26.65 1.04+psyco
So it's now actually slightly faster with psyco too. Did you make any other
changes
besides fixing the memory leak? (Not that it's surprising that a memory leak by
itself might cause slowness.)
double_compatibility is actually still slower with 1.04 than with 1.03 but the
difference now is just about 2% (confirmed over multiple runs).
Original comment by fredrik....@gmail.com
on 11 Dec 2008 at 3:03
Do you have the time to also implement from_man_exp? It does essentially
nothing but
extract the sign and count bits of the mantissa and then call normalize(), so I
think
it should be really simple with normalize() already implemented.
Running the tests with -profile, with a total time of 114.074 s, from_man_exp
represents 4.874 s tottime and 10.763 s cumtime.
Another cheap possibility that comes to mind is to provide a mpmath-compatible
bitcount() to avoid the Python wrapper function. Currently gmpy_bitcount
represents
5.394 s tottime and 8.927 s cumtime. (However, -profile probably exaggerates the
overhead, so the real benefit might be much smaller.)
Original comment by fredrik....@gmail.com
on 11 Dec 2008 at 3:25
Sorry that it's taken me so long to respond.
I implemented from_man_exp but there was no significant speed-up. I did commit
Mario
Pernici's patch for bit_length and it did offer a measureable speed-up.
Comparing
gmpy 1.03 to gmpy 1.04 with bit_length, the running time for runtests.py
improved
from 25.8 seconds to 21.5 seconds. Using psyco and gmpy 1.04, the running time
was
18.5 seconds.
At this point, I'd just like to drop the normalize and from_man_exp functions
until
I can figure out a way to make them faster. Then I can get gmpy 1.04 released.
Case
Original comment by casevh
on 4 Jan 2009 at 9:17
I found a few issues with _mpmath_create. 64-bit performance on Linux improved
from
22.6 seconds to 19.3 seconds. Under Windows, I believe it is still a little
faster
but it is hard for me to get accurate timing results on a virtual machine.
I added a patch to libmpf to support the new features.
Could you run try running some tests again?
I've uploaded new versions of gmpy 1.03 and 1.04 compiled with identical
options and
the same version of GMP. They are located at
http://home.comcast.net/~casevh/gmpy-1.03.win32-py2.6.exe
http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe
Thanks,
Case
Original comment by casevh
on 5 Jan 2009 at 8:31
Hi
You might be interested to know that the gmpy v1.04 code fails to compile
correctly
with MS Visual C/C++ because mpq_out_str is ok with the native toolset and the
dllexport declaration has been omitted from the initgmpy function, which should
be
declared as:
__declspec( dllexport ) void initgmpy(void)
(see v1.03)
best regards,
Brian Gladman
Original comment by rieman...@gmail.com
on 5 Jan 2009 at 3:06
1.03 - 35.19 s
1.04 - 27.69 s
Original comment by fredrik....@gmail.com
on 6 Jan 2009 at 3:31
Ondrej, can you set up gmpy 1.04 on sage? It would be interesting to see
results of
the MPFR benchmark suite with the latest revision of mpmath plus gmpy 1.04.
Original comment by fredrik....@gmail.com
on 6 Jan 2009 at 4:06
Not soon, as they are having problems with sage.math right now.
Original comment by ondrej.c...@gmail.com
on 6 Jan 2009 at 5:17
I've committed Mario's performance improvements to _mpmath_normalize and
_mpmath_create. I won't have time to create new binaries until tomorrow.
runtests results for gmpy 1.03 vs. 1.04, Linux x64
1.03 - 22.6 s
1.04 - 18.6 s
Case
Original comment by casevh
on 8 Jan 2009 at 7:09
Brian,
Thanks for the report. I'm still trying to setup a VS tool chains so I can
compile
natively under Windows. I added DL_EXPORT(void) which should be a
cross-platform fix.
I haven't had a chance to test it yet.
I'm not clear about the error you are getting with mpq_out_str.
Case
Original comment by casevh
on 8 Jan 2009 at 7:14
Hi Case
I would be happy to provide my VS build files for GMPY if this would help. I
am not
sure how to post code but I am sure I can find out!
The mpq_out_str issue arises because of a 'fix' that mentions this function in
the
1.04 version of gmpy.c - these lines are not needed since GMP built with Visual
Studio doesn't appear to have this problem:
/* fix anomalies of .h vs .lib on Windows; also, use alloca */
#undef mpq_out_str
size_t mpq_out_str _PROTO ((FILE *, int, mpq_srcptr));
and these lines cause the compile to fail.
Brian
Original comment by rieman...@gmail.com
on 8 Jan 2009 at 7:59
I haave attached a ZIP file containing my Visual Studio build files for GMPY
and a
revised copy of gmpy.c with the changes I needed to get the latest version to
compile. In addition to the mpq_out_str issue, the MS compiler doesn't provide
isnan
and isinf and it doesn't support C99 mixing of declarations and code.
Let me know if I can halp with any of this.
best regards,
Brian
PS I would have posted this in GMPY but for some reason I was not offered the
option
of commenting there.
Original comment by rieman...@gmail.com
on 8 Jan 2009 at 9:52
Attachments:
Thanks! I will merge the changes in the next couple of days.
Case
Original comment by casevh
on 9 Jan 2009 at 6:06
With the faster bitcount, some code can be optimized further for gmpy mode.
Most importantly, mpf_mul becomes 40% faster by just calling bitcount(man)
instead of
calculating it from sbc and tbc. Other places are mpf_mul_int, mpf_add and
mpf_pow_int.
Original comment by fredrik....@gmail.com
on 12 Jan 2009 at 1:51
The attached patch to libmpf.py rev. 832 uses bitcount in mpf_add, etc.
as indicated by Fredrik (apart from mpf_pow_int).
On my Athlon XP 2600 runtests is 3% faster, add, mul, mul_int are
respectively 15%, 22%, 30% faster.
Adding in gmpy mpz_bitcount, which is an optimized version of bit_length
which accepts only mpz, and optimizing input processing in mpmath_normalize
runtests gets another 5% speedup, add, mul, mul_int, div 20%, 6%, 34%, 9%
respectively. I am submitting a patch for this in gmpy.
Original comment by mario.pe...@gmail.com
on 13 Jan 2009 at 12:08
Attachments:
I was working on the same thing; I have committed my changes. Feel free to
commit the
bitcount1 optimization. I think we should not have two versions of mpf_add,
though;
it's just too much code to duplicate.
As for mpf_pow_int, it could be optimized at low precision by taking several
bits
from the exponent at every step. For example with 53-bit precision and taking 4
bits
at a time, one would compute (man**r)>>shift where each r < 16 so man**r <
2^850;
this is probably faster than four full steps through the loop.
Original comment by fredrik....@gmail.com
on 13 Jan 2009 at 12:44
I've committed Mario's patches to gmpy. I did change the name from mpz_bitcount
to
_mpmath_bitcount. I didn't want to cause confusion with bit_length.
I've put new Windows binaries for gmpy 1.04 at
http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe
There are versions for Python 2.3,4,5,6. These were compiled using MPIR instead
of
GMP. I also used --enable-fat so these binaries should automatically detect
which
processor is in use and select code optimized for that processor.
On my 64-bit Linux box, I have the following results for runtests.py:
-nogmpy: ~40.5s
gmpy 1.03: ~21.1s
gmpy 1.04: ~15.8s
The same version of MPIR was used in each case.
Original comment by casevh
on 14 Jan 2009 at 7:26
Windows 32bit results:
python 2.5 (gmpy 1.03)
python: 59.46
python+psyco: 47.09
gmpy: 33.64
gmpy+psyco: 24.65
python 2.6 (gmpy 1.04)
python: 58.03
python+psyco: 47.86
gmpy: 25.39
gmpy+psyco 19.65
Original comment by fredrik....@gmail.com
on 14 Jan 2009 at 8:59
I submitted in gmpy a patch with _mpmath_mpf_mul, with which mpf_mul is 2x
faster
at dps=100.
Original comment by mario.pe...@gmail.com
on 19 Jan 2009 at 4:38
There is a bug in _mpmath_bitcount: it gives bitcount(0) = 1.
The bug is triggered when running the tests with -strict.
Why do we need both bitcount1 and bitcount? bitcount1 checks the type of its
argument
anyway, so why can't bit_length be as fast?
Original comment by fredrik....@gmail.com
on 23 Jan 2009 at 3:11
I'd like to remove bitcount1, just use bitcount = bit_length, and any further
optimization to bit_length can be done on the gmpy end.
Original comment by fredrik....@gmail.com
on 23 Jan 2009 at 3:38
Original comment by ondrej.c...@gmail.com
on 23 Jan 2009 at 4:31
I submitted in gmpy a fix for this bug in _mpmath_bitcount .
bitcount1 is 20% faster than bitcount for small numbers, and bitcounting is a
bottleneck, so I would prefer it remained.
I doubt that bit_length can be optimized much; it is
both a function and a method, which is some overhead; it must call anynum2mpz
since
it is not guaranteed that the number is an mpz.
On a related issue I wonder if introducing bitcount in GMP or MPIR would be
significantly faster than using mpz_sizeinbase, in which there is some overhead
to
support a generic base.
Original comment by mario.pe...@gmail.com
on 23 Jan 2009 at 8:04
Can't bit_length check quickly for mpz type first, and only if that fails call
anynum2mpz?
Original comment by fredrik....@gmail.com
on 23 Jan 2009 at 8:46
Fredrik, you are right, bit_length can be optimized; in the attached file there
is a version of bit_length with which bitcount in 4% slower than bitcount1.
I will see if it can be made faster, but even if not, I agree that
_mpmath_bitcount
and bitcount1 can be removed.
Original comment by mario.pe...@gmail.com
on 25 Jan 2009 at 9:49
Attachments:
I've built what I hope are the final installers for gmpy 1.04. They are
temporarily
availabe at http://home.comcast.net/~casevh/gmpy-1.04.win32-py2.6.exe.
The were built using MPIR instead of GMP and support code for multiple CPUs.
Please test. If are no problems, gmpy 1.04 should be released very soon.
Case
Original comment by casevh
on 30 Jan 2009 at 7:31
Original issue reported on code.google.com by
casevh
on 23 Oct 2008 at 6:23