Closed videlec closed 4 months ago
Description changed:
---
+++
@@ -5,10 +5,10 @@
prog="""
from cypari2 import Pari
pari = Pari(size={size})
-pari({cmd})
+pari("{cmd}")
"""
-prog = prog.format(size=2**24, cmd="\"quadclassunit(1 - 2^100)\"")
+prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
print("python environment with cypari2: ",
timeit.timeit(prog, number=3))
Description changed:
---
+++
@@ -34,3 +34,7 @@
with sage custom GMP allocation functions: 4.941096214999561
with the whole sagemath library lodaded: 7.39602135700261
+ +See also +- this sage-devel thread +- a similar problem with pari_jupyter pari-jupyter/#27
Could it be just a different memory layout as soon as sage.all
is imported, causing the memory consumption of the process to increase about 6-fold ?
FWIW, I don't see the issue on my system:
python environment with cypari2: 3.9759535739940475
with sage custom GMP allocation functions: 4.250082928003394
with the whole sagemath library lodaded: 3.8536450660030823
(edit: I previously posted a version where I was running the test code under sage, not ipython, but the results above are with ipython)
Thanks Marc for running the test. What is your configuration? eg what is the system and does sage use system pari?
I was going to add more details after doit sage -f pari
—which is still running at the moment. But the example above is already using sage's pari (I also have a systemwide pari, but it is a different version). This is on Debian sid with an Intel cpu. Note that I am using Python 3.11 via #33842.
on ubuntu 22.04.01 LTS:
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 9.8.beta5, Release Date: 2022-12-11 │
│ Using Python 3.10.6. Type "help()" for help. │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: import timeit
....:
....: prog="""
....: from cypari2 import Pari
....: pari = Pari(size={size})
....: pari("{cmd}")
....: """
....:
....: prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
....:
....: print("python environment with cypari2: ",
....: timeit.timeit(prog, number=3))
....:
....: setup="""
....: from sage.ext.memory import init_memory_functions
....: init_memory_functions()
....: """
....: print("with sage custom GMP allocation functions:",
....: timeit.timeit(prog, setup=setup, number=3))
....:
....: setup="""
....: import sage.all
....: """
....: print("with the whole sagemath library lodaded: ",
....: timeit.timeit(prog, setup=setup, number=3))
python environment with cypari2: 5.757997545006219
with sage custom GMP allocation functions: 5.887209259002702
with the whole sagemath library lodaded: 5.1330207900027744
If you replace
import sage.all
with
import sympy
you'll get the same slowdown. Bingo! Sympy probably changes FPU status, it seems, according to
build/pkgs/sympow/email-exchange.txt
Replying to Dima Pasechnik:
Sympy probably changes FPU status, it seems, according to
build/pkgs/sympow/email-exchange.txt
Did you accidentally look in sympow/
instead of sympy/
, or do you mean sympow also has something to do with it?
I (obviously) don't see any slowdown here with import sympy
either, even after reinstalling sympy with sage -f
.
You know that the greatest discoveries are made by accident ;-)
Indeed, I went looking for FPU in sage source repo, and found that sympow
thing, which got swapped to sympy
in my brain... Anyhow, for me sympy
causes the same slowdown:
$ ./sage --python p.py
python environment with cypari2: 4.7912631159997545
with sage custom GMP allocation functions: 4.886824406014057
with sympy loaded: 6.885674357006792
$ cat p.py
import timeit
prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""
prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
print("python environment with cypari2: ",
timeit.timeit(prog, number=3))
setup="""
from sage.ext.memory import init_memory_functions
init_memory_functions()
"""
print("with sage custom GMP allocation functions:",
timeit.timeit(prog, setup=setup, number=3))
setup="""
import sympy
"""
print("with sympy loaded: ",
timeit.timeit(prog, setup=setup, number=3))
I see this sympy
difference on the latest Sage beta on 3 boxes: Fedora 34 with a newish CPU, and on Gentoo with a much older CPU, and on Debian Bullseye (stable) with an even older and slower CPU.
I see a bit of a speedup, by about 15%, with either sage.all
or sympy
imported on macOS 13 (x86_64, as well as arm64, aka M1)
Replying to Dima Pasechnik:
I see this
sympy
difference on the latest Sage beta on 3 boxes: Fedora 34 with a newish CPU, and on Gentoo with a much older CPU, and on Debian Bullseye (stable) with an even older and slower CPU.I see a bit of a speedup, by about 15%, with either
sage.all
orsympy
imported on macOS 13 (x86_64, as well as arm64, aka M1)
Interesting.
import sympy
import sympy
and import sage.all
narrowing it down further, with taking all of Sage out:
# p1.py
import timeit
prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""
prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
print("python environment with cypari2: ",
timeit.timeit(prog, number=3))
setup="""
import sympy
"""
print("with sympy loaded: ",
timeit.timeit(prog, setup=setup, number=3))
On Gentoo with system python3.10.8
, system-wide pari 2.15.1
, running this p1.py
cypari2
and sympy
- no time differencecypari2
and sympy
- no time differencecypari
and sympy
- sympy
produces a slowdown. So this is down to interaction between python, sympy, and cypari2 (and Sage's venv?)
Matthias, there is something funny in Sage's venv, causing such a slowdown on Linux.
To narrow it down further, you can try installing the Sage-built wheels (from venv/var/lib/sage/wheels/) in a separate venv. Specifically, take cypari2 from the sage wheel rather than from pypi.
Replying to Matthias Köppe:
To narrow it down further, you can try installing the Sage-built wheels (from venv/var/lib/sage/wheels/) in a separate venv. Specifically, take cypari2 from the sage wheel rather than from pypi.
This way, no difference in timing, so the wheels themselves are OK.
Next you could check whether it makes a difference to run the separate venv inside or outside of the sage environment set up by sage -sh
.
if I start sage -sh
, cd somewhere outside Sage directory, run python -m venv swheels
,
activate swheels
, and try doing things there, pip
does not look too healthy:
(swheels) (sage-sh) dima@hilbert:swheels$ pip cache purge
An error occurred during configuration: option format: invalid choice: 'columns' (choose from 'human', 'abspath')
regardless, pip install
works there, and I can run the test - again, no difference in time.
So, somehow, putting Sage's python into venv cures this time discrepancy.
(swheels) (sage-sh) dima@hilbert:swheels$ python p.py
python environment with cypari2: 5.116253892017994
with sympy loaded: 5.216335576027632
(swheels) (sage-sh) dima@hilbert:swheels$ deactivate
(sage-sh) dima@hilbert:swheels$ which python
/mnt/opt/Sage/sage-dev/local/var/lib/sage/venv-python3.10/bin/python
(sage-sh) dima@hilbert:swheels$ python p.py
python environment with cypari2: 5.446458007965703
with sympy loaded: 8.667378643993288
Replying to Dima Pasechnik:
if I start
sage -sh
, cd somewhere outside Sage directory, runpython -m venv swheels
, activateswheels
, and try doing things there,pip
does not look too healthy:(swheels) (sage-sh) dima@hilbert:swheels$ pip cache purge An error occurred during configuration: option format: invalid choice: 'columns' (choose from 'human', 'abspath')
Indeed. I've opened #34853 for this unrelated issue.
I'd suggest to look whether mpmath is somehow involved, see #25445
Replying to Matthias Köppe:
I'd suggest to look whether mpmath is somehow involved, see #25445
In case, it's exactly the same wheels, mpmath
included, that are involved here. As to #25445, indeed, import mpmath
has the same side effect on performance as import sympy
.
So we can now test instead the following:
# p1.py
import timeit
prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""
prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
print("python environment with cypari2: ",
timeit.timeit(prog, number=3))
setup="""
import mpmath
"""
print("with mpmath loaded: ",
timeit.timeit(prog, setup=setup, number=3))
Try if the various MPMATH_...
environment variables (https://github.com/mpmath/mpmath/blob/master/mpmath/libmp/backend.py#L66) have an effect
Oops, we're back to import sage.all
here, as your link points few lines below at
if ('MPMATH_NOSAGE' not in os.environ and 'SAGE_ROOT' in os.environ or
'MPMATH_SAGE' in os.environ):
try:
import sage.all
import sage.libs.mpmath.utils as _sage_utils
sage = sage.all
sage_utils = _sage_utils
BACKEND = 'sage'
MPZ = sage.Integer
except:
pass
and skipping it indeed cures the discrepancy.
(sage-sh) dima@hilbert:swheels$ MPMATH_NOSAGE=yes python p.py
python environment with cypari2: 5.126071550010238
with sympy loaded: 5.132636028982233
(sage-sh) dima@hilbert:swheels$ python p.py
python environment with cypari2: 5.0600332279573195
with sympy loaded: 8.450407178024761
sorry for the bogus lead :-)
The question is now - can we import less than sage.all
pieces?
Can you reproduce it if you replace import mpmath
by import sage.libs.pari.all
or import.sage.cpython.all
etc?
Replying to Matthias Köppe:
Can you reproduce it if you replace
import mpmath
byimport sage.libs.pari.all
orimport.sage.cpython.all
etc?
Success with sage.libs.eclib.mat
or sage.libs.eclib.homspace
- they are still a serious chunk of sage.all
, about 40%, by some measure, but not all of it:
$ ./sage --python
>>> import sys
>>> import sage.libs.eclib.mat
>>> len(sys.modules.keys())
673
>>>
vs
$ ./sage --python
Python 3.10.8 (main, Nov 29 2022, 20:55:11) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules.keys())
74
>>> import sage.all
>>> len(sys.modules.keys())
1712
for sage.libs.eclib.homspace
one has 674
keys
(but these for sage.libs.eclib.mat
are a subset of these, so we can go with the latter)
>>> import sage.libs.eclib.mwrank
>>> len(sys.modules.keys())
346
and import sage.libs.eclib.mwrank
does not reproduce. so we're down to ~330 modules to check :-)
eclib
certainly uses libpari
, so it potentially could be that loading it changes some pari
defaults.
But I don't know anything about this.
Maybe John C. can comment...
Yes, eclib calls some functions in libpari and so must make sure that libpari has been initialised. This is done in the eclib function eclib_pari_init
which detects whether it has already been initialised (by testing the value of pari variable "avma") and if necessary calls pari_init()
. The first parameter to pari_init
is taken from the environment variable PARI_SIZE
and defaults to (currently) 10^8
if that variable is not set or has a value which cannot be converted to a long int.
The function eclib_pari_init()
need never be called by users (and is not called by Sage) as it is called automatically when needed by one of two other functions which themseves use pari functions. I would expect that whenever Sage calls an eclib function which results in eclib's eclib_pari_init
to be called, it will find that libpari has already been initialised (so avma will be nonzero) and then do nothing. But my expectation could be wrong in some situations?
See https://github.com/JohnCremona/eclib/blob/master/libsrc/parifact.cc
OK, avma
is a global Pari variable. Declared in pari/paristio.h
as
extern THREAD pari_sp avma;
Potentially every Sage module depending on Pari might want to check it and run Pari initialisation if it decided to, or perhaps even unconditionally, without checking. I don't know how it plays up in the presence of multithreading, either.
Suspects: lcalc
, giac
...
By the way, also import src.sage.libs.pari.convert_sage
triggers the slowdown.
import sage.rings.integer
is another reproducer, with "only" 346 sub-imports
lcalc and giac are OK.
Have you tried running both versions under perf top
or perf record
to see where the time goes?
Replying to Marc Mezzarobba:
Have you tried running both versions under
perf top
orperf record
to see where the time goes?
I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.
Replying to Dima Pasechnik:
import sage.rings.integer
is another reproducer, with "only" 346 sub-imports
it is reproducer on Fedora 34, but not on Gentoo. So that's getting even more bizarre.
lcalc and giac are OK.
on the other hand, they both unconditionally initialise Pari by calling pari_init_options
.
In view of avma
being a thread-local global variable, I'm not sure about what's going on there, but perhaps we still should patch them as follows:
--- a/src/lcalc/Lcommandline.cc
+++ b/src/lcalc/Lcommandline.cc
@@ -31,6 +31,9 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#include "config.h"
#include "Lcommandline.h"
#include "cmdline.h"
+#if HAVE_LIBPARI
+#include <pari/pari.h>
+#endif
void pari_err_recover_nop(long errnum) {return;}
@@ -482,10 +485,11 @@ int main (int argc, char *argv[])
t2=.5; //t2=.5 because of the GAMMA(s+1/2)
-
+ if (!avma) {
pari_init_opts(400000000,2,INIT_DFTm); // the last option is to prevent
//pari from giving its interrupt signal when its elliptic curve a_p
//algorithm is called and interrupted with ctrl-c.
+ }
coeff = new Double[3];
//compute the conductor which is copied to coeff[1]
and for giac:
diff --git a/src/pari.cc b/src/pari.cc
index 76ce8e1..bdf1073 100644
--- a/src/pari.cc
+++ b/src/pari.cc
@@ -97,8 +97,10 @@ namespace giac {
static void call_pari_init()
{
// do not initialize INIT_JMP so that PARI error do not exit
- pari_init_opts(pari_mem_size,pari_maxprime,INIT_SIGm | INIT_DFTm);
- paristack_setsize(pari_mem_size, (1<<30));
+ if (!avma) {
+ pari_init_opts(pari_mem_size,pari_maxprime,INIT_SIGm | INIT_DFTm);
+ paristack_setsize(pari_mem_size, (1<<30));
+ }
// initialize variable ordering x,y,z
gp_read_str("[x,y,z,t]");
}
Replying to Dima Pasechnik:
Replying to Marc Mezzarobba:
Have you tried running both versions under
perf top
orperf record
to see where the time goes?I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.
I'm happy to help if I can, but at the moment I don't understand how to reproduce the slowdown. I was suggesting perf because I thought a systemwide profiler may help understand what is changing.
Replying to Marc Mezzarobba:
Replying to Dima Pasechnik:
Replying to Marc Mezzarobba:
Have you tried running both versions under
perf top
orperf record
to see where the time goes?I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.
I'm happy to help if I can, but at the moment I don't understand how to reproduce the slowdown. I was suggesting perf because I thought a systemwide profiler may help understand what is changing.
I don't know (or can't recall :-)) how to use perf
. I imagine one would need a special build of libpari
, and some magical commands to run...
Replying to Dima Pasechnik:
I don't know (or can't recall :-)) how to use
perf
. I imagine one would need a special build oflibpari
, and some magical commands to run...
Just run
perf record -- sage -python3 pari_timing.py
and then
perf report
Also, after upgrading my system and going back to python 3.10, I do see the slowdown. I'll check what happens on the 3.11 branch with the same system packages -- but this looks like it will require a full rebuild, so it may take some time.
The slowdown is still there after a full rebuild with python 3.11.
Replying to Dima Pasechnik:
On Gentoo with system
python3.10.8
, system-widepari 2.15.1
, running thisp1.py
You wrote “system-wide pari 2.15.1
”, but if I understand the spkg-configure
right, pari 2.15.* is not allowed by sage's configure. Could it be that your tests are not all using the same version of pari?
Replying to Marc Mezzarobba:
Replying to Dima Pasechnik:
On Gentoo with system
python3.10.8
, system-widepari 2.15.1
, running thisp1.py
You wrote “system-wide
pari 2.15.1
”, but if I understand thespkg-configure
right, pari 2.15.* is not allowed by sage's configure. Could it be that your tests are not all using the same version of pari?
I am on https://github.com/sagemath/sage-prod/issues/34537 (update to Pari 2.15.1) in Gentoo.
So there is an evidence that in this case you need to import more than sage.rings.integer
to see a slowdown.
So I am now able to reproduce the issue. Comparing the profiles, the main difference seems to be a huge amount of time spent in the dynamic linker, more precisely in _dl_check_map_versions()
(presumably called by dlopen()
) and in update_get_addr()
. Regarding the latter, the relevant code seems to be (glibc/elf/dl-tls.c
):
void *
__tls_get_addr (GET_ADDR_ARGS)
{
dtv_t *dtv = THREAD_DTV ();
/* Update is needed if dtv[0].counter < the generation of the accessed
module. The global generation counter is used here as it is easier
to check. Synchronization for the relaxed MO access is guaranteed
by user code, see CONCURRENCY NOTES in _dl_update_slotinfo. */
size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
if (__glibc_unlikely (dtv[0].counter != gen))
return update_get_addr (GET_ADDR_PARAM); // <----------
void *p = dtv[GET_ADDR_MODULE].pointer.val;
if (__glibc_unlikely (p == TLS_DTV_UNALLOCATED))
return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
return (char *) p + GET_ADDR_OFFSET;
}
Afaict the slow path is never taken when running cypari under plain Python, while it is taken all the time when parts of Sage are loaded. But I don't know what to make of all that... (There is some information on TLS that looks like it might be relevant at https://chao-tic.github.io/blog/2018/12/25/tls)
I don't think your profiling makes much sense this way, as you are not excluding time taken by import
, right?
You can mitigate this by increasing number=
to 30 or 50 - then you won't see linker kicking in too much.
Or I misunderstand, and you say that the linker is called by Pari?
When trying "python3 with cypari" and "python3 with sympy imported" with number=50
the effect is much more pronounced than with number=3
and it seems it's still stuff in the linker that explains the difference in the profile. perf diff perf_python3.data perf_sympy.data
yields:
+23.90% ld-linux-x86-64.so.2 [.] _dl_update_slotinfo
30.30% -12.38% libpari-gmp-tls.so.2.13.4 [.] 0x000000000009fc74
+10.87% ld-linux-x86-64.so.2 [.] update_get_addr
10.31% -4.60% libpari-gmp-tls.so.2.13.4 [.] addii_sign
+4.40% ld-linux-x86-64.so.2 [.] __tls_get_addr_slow
6.54% -2.55% libpari-gmp-tls.so.2.13.4 [.] xxgcduu
11.61% -2.51% ld-linux-x86-64.so.2 [.] __tls_get_addr
4.62% -2.07% libpari-gmp-tls.so.2.13.4 [.] shifti
3.24% -1.45% libpari-gmp-tls.so.2.13.4 [.] muliispec
3.11% -1.16% libpari-gmp-tls.so.2.13.4 [.] dvmdii
2.43% -1.08% libpari-gmp-tls.so.2.13.4 [.] mulii
2.09% -1.01% libgmp.so.10.4.1 [.] __gmpn_divrem_2
1.91% -0.88% libpari-gmp-tls.so.2.13.4 [.] muluu
1.75% -0.84% libgmp.so.10.4.1 [.] __gmpn_addmul_1_x86_64
1.70% -0.62% libpari-gmp-tls.so.2.13.4 [.] abscmpii
1.24% -0.59% libgmp.so.10.4.1 [.] __gmpn_invert_limb
1.46% -0.58% libgmp.so.10.4.1 [.] __gmpn_tdiv_qr
where, as far as I can see, __tls_get_addr_slow
doesn't show up at all for python3
. So as far as I can see, some switch gets thrown that forces the linker to look up a symbol again and again, in a slower way. I would have expected dynamic linking to happen once in a run, and then getting out of the way. That doesn't seem to be happening here.
I would have expected dynamic linking to happen once in a run, and then getting out of the way. That doesn't seem to be happening here.
Indeed, this very, very weird. I've never heard about such cases. But it seems we are now debugging a rather eigenzinnig, pardon my Dutch, Pari's use of threads?
I'm also seeing the slow-down with just import mpmath
(which is also something sympy
does). So perhaps it's something mpmath does with its configuration of gmp? Note that mpfr, which is used by mpmath, also makes use of gmp, so that could be another source of conflicts. Plus: mpmath has some sage-specific backend stuff, so importing a sage-aware mpmath could have different effects than one that isn't.
One hypothesis would be that the gmp library gets linked twice: once by python and once by libpari, and perhaps under different threading conditions; leading to an error in initialization of the thread-local storage of libgmp; causing a slow symbol resolution in libgmp every time ...
I did not see a slowdown with import gmpy2
, so it's something mpmath
does. It may well be something in sage.libs.mpmath.ext_main
, which a sage-aware mpmath imports.
In fact, ext_libmp.pyx:11 indicates that sage-aware mpmath
would call some initialization that has to do with gmp at import-time. So if libpari messes with threads afterwards and links gmp, there may be some room for things to go sideways.
Replying to Nils Bruin:
I'm also seeing the slow-down with just
import mpmath
(which is also somethingsympy
does). So perhaps it's something mpmath does with its configuration of gmp?
Sage-aware mpmath imports sage.all
(see comment:24), so in this sense it's a moot point.
But yes, there is quite a bit of dodgy things going on in mpmath
.
Replying to Dima Pasechnik:
Sage-aware mpmath imports
sage.all
(see comment:24), so in this sense it's a moot point. But yes, there is quite a bit of dodgy things going on inmpmath
.
Apologies for replicating that already-established knowledge. Following up on comment:23: when using the correct syntax to set MPMATH_NOSAGE=1
, the import of mpmath
has no adverse effect. So it's something else in sage.all
that causes the problem (edited in response to comment:49).
Replying to Nils Bruin:
Following up on comment:23: as far as I can see,
MPMATH_NOSAGE, MPMATH_NOGMPY
have no influence on the result.
Are you sure?
(sage-sh) marc@tramontane:tmp$ python3 pari_timing.py
python environment with cypari2: 3.799615835014265
with mpmath loaded: 6.606633314979263
(sage-sh) marc@tramontane:tmp$ export MPMATH_NOSAGE=1
(sage-sh) marc@tramontane:tmp$ python3 pari_timing.py
python environment with cypari2: 3.996781985013513
with mpmath loaded: 4.207063634006772
to quote GCC people on the topic __tls_get_addr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81501#c6
We recently upgraded our toolchain from GCC9 to GCC11, and we're seeing __tls_get_addr
take up to 10% of total runtime under some workloads, where it was 1-2% before.
indeed, this means that if __tls_get_addr
gets slowed down, the same code might suddenly get visibly slower.
The story of __tls_get_addr_slow
in glibc (https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/tls_get_addr.S) starts from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
but I can't make much sense of it. Something to do with misaligned stack.
gives
See also
CC: @edgarcosta @mkoeppe @JohnCremona @fredrik-johansson @zimmermann6
Component: performance
Issue created by migration from https://trac.sagemath.org/ticket/34850