sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

sage-loaded libraries are slowing down cypari2 a lot (on x86_64 glibc) #34850

Closed videlec closed 4 months ago

videlec commented 1 year ago
import timeit

prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""

prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")

print("python environment with cypari2:          ",
      timeit.timeit(prog, number=3))

setup="""
from sage.ext.memory import init_memory_functions
init_memory_functions()
"""
print("with sage custom GMP allocation functions:",
      timeit.timeit(prog, setup=setup, number=3))

setup="""
import sage.all
"""
print("with the whole sagemath library lodaded:  ",
      timeit.timeit(prog, setup=setup, number=3))

gives

$ python pari_timing.py 
python environment with cypari2:           4.871978694998688
with sage custom GMP allocation functions: 4.941096214999561
with the whole sagemath library lodaded:   7.39602135700261

See also

CC: @edgarcosta @mkoeppe @JohnCremona @fredrik-johansson @zimmermann6

Component: performance

Issue created by migration from https://trac.sagemath.org/ticket/34850

videlec commented 1 year ago

Description changed:

--- 
+++ 
@@ -5,10 +5,10 @@
 prog="""
 from cypari2 import Pari
 pari = Pari(size={size})
-pari({cmd})
+pari("{cmd}")
 """

-prog = prog.format(size=2**24, cmd="\"quadclassunit(1 - 2^100)\"")
+prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")

 print("python environment with cypari2:          ",
       timeit.timeit(prog, number=3))
videlec commented 1 year ago

Description changed:

--- 
+++ 
@@ -34,3 +34,7 @@
 with sage custom GMP allocation functions: 4.941096214999561
 with the whole sagemath library lodaded:   7.39602135700261

+ +See also +- this sage-devel thread +- a similar problem with pari_jupyter pari-jupyter/#27

dimpase commented 1 year ago
comment:3

Could it be just a different memory layout as soon as sage.all is imported, causing the memory consumption of the process to increase about 6-fold ?

mezzarobba commented 1 year ago
comment:5

FWIW, I don't see the issue on my system:

python environment with cypari2:           3.9759535739940475
with sage custom GMP allocation functions: 4.250082928003394
with the whole sagemath library lodaded:   3.8536450660030823

(edit: I previously posted a version where I was running the test code under sage, not ipython, but the results above are with ipython)

videlec commented 1 year ago
comment:6

Thanks Marc for running the test. What is your configuration? eg what is the system and does sage use system pari?

mezzarobba commented 1 year ago
comment:7

I was going to add more details after doit sage -f pari—which is still running at the moment. But the example above is already using sage's pari (I also have a systemwide pari, but it is a different version). This is on Debian sid with an Intel cpu. Note that I am using Python 3.11 via #33842.

mantepse commented 1 year ago
comment:8

on ubuntu 22.04.01 LTS:

┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 9.8.beta5, Release Date: 2022-12-11               │
│ Using Python 3.10.6. Type "help()" for help.                       │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: import timeit
....: 
....: prog="""
....: from cypari2 import Pari
....: pari = Pari(size={size})
....: pari("{cmd}")
....: """
....: 
....: prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")
....: 
....: print("python environment with cypari2:          ",
....:       timeit.timeit(prog, number=3))
....: 
....: setup="""
....: from sage.ext.memory import init_memory_functions
....: init_memory_functions()
....: """
....: print("with sage custom GMP allocation functions:",
....:       timeit.timeit(prog, setup=setup, number=3))
....: 
....: setup="""
....: import sage.all
....: """
....: print("with the whole sagemath library lodaded:  ",
....:       timeit.timeit(prog, setup=setup, number=3))
python environment with cypari2:           5.757997545006219
with sage custom GMP allocation functions: 5.887209259002702
with the whole sagemath library lodaded:   5.1330207900027744
dimpase commented 1 year ago
comment:9

If you replace

    import sage.all

with

    import sympy

you'll get the same slowdown. Bingo! Sympy probably changes FPU status, it seems, according to build/pkgs/sympow/email-exchange.txt

mezzarobba commented 1 year ago
comment:10

Replying to Dima Pasechnik:

Sympy probably changes FPU status, it seems, according to build/pkgs/sympow/email-exchange.txt

Did you accidentally look in sympow/ instead of sympy/, or do you mean sympow also has something to do with it?

I (obviously) don't see any slowdown here with import sympy either, even after reinstalling sympy with sage -f.

dimpase commented 1 year ago
comment:11

You know that the greatest discoveries are made by accident ;-)

Indeed, I went looking for FPU in sage source repo, and found that sympow thing, which got swapped to sympy in my brain... Anyhow, for me sympy causes the same slowdown:

$ ./sage --python p.py
python environment with cypari2:           4.7912631159997545
with sage custom GMP allocation functions: 4.886824406014057
with sympy loaded:   6.885674357006792
$ cat p.py
import timeit

prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""

prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")

print("python environment with cypari2:          ",
      timeit.timeit(prog, number=3))

setup="""
from sage.ext.memory import init_memory_functions
init_memory_functions()
"""
print("with sage custom GMP allocation functions:",
      timeit.timeit(prog, setup=setup, number=3))

setup="""
import sympy
"""
print("with sympy loaded:  ",
      timeit.timeit(prog, setup=setup, number=3))
dimpase commented 1 year ago
comment:12

I see this sympy difference on the latest Sage beta on 3 boxes: Fedora 34 with a newish CPU, and on Gentoo with a much older CPU, and on Debian Bullseye (stable) with an even older and slower CPU.

I see a bit of a speedup, by about 15%, with either sage.all or sympy imported on macOS 13 (x86_64, as well as arm64, aka M1)

videlec commented 1 year ago
comment:13

Replying to Dima Pasechnik:

I see this sympy difference on the latest Sage beta on 3 boxes: Fedora 34 with a newish CPU, and on Gentoo with a much older CPU, and on Debian Bullseye (stable) with an even older and slower CPU.

I see a bit of a speedup, by about 15%, with either sage.all or sympy imported on macOS 13 (x86_64, as well as arm64, aka M1)

Interesting.

dimpase commented 1 year ago
comment:14

narrowing it down further, with taking all of Sage out:

# p1.py
import timeit

prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""

prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")

print("python environment with cypari2:         ",
      timeit.timeit(prog, number=3))

setup="""
import sympy
"""
print("with sympy loaded:                       ",
      timeit.timeit(prog, setup=setup, number=3))

On Gentoo with system python3.10.8, system-wide pari 2.15.1, running this p1.py

So this is down to interaction between python, sympy, and cypari2 (and Sage's venv?)

dimpase commented 1 year ago
comment:15

Matthias, there is something funny in Sage's venv, causing such a slowdown on Linux.

mkoeppe commented 1 year ago
comment:16

To narrow it down further, you can try installing the Sage-built wheels (from venv/var/lib/sage/wheels/) in a separate venv. Specifically, take cypari2 from the sage wheel rather than from pypi.

dimpase commented 1 year ago
comment:17

Replying to Matthias Köppe:

To narrow it down further, you can try installing the Sage-built wheels (from venv/var/lib/sage/wheels/) in a separate venv. Specifically, take cypari2 from the sage wheel rather than from pypi.

This way, no difference in timing, so the wheels themselves are OK.

mkoeppe commented 1 year ago
comment:18

Next you could check whether it makes a difference to run the separate venv inside or outside of the sage environment set up by sage -sh.

dimpase commented 1 year ago
comment:19

if I start sage -sh, cd somewhere outside Sage directory, run python -m venv swheels, activate swheels, and try doing things there, pip does not look too healthy:

(swheels) (sage-sh) dima@hilbert:swheels$ pip cache purge
An error occurred during configuration: option format: invalid choice: 'columns' (choose from 'human', 'abspath')

regardless, pip install works there, and I can run the test - again, no difference in time. So, somehow, putting Sage's python into venv cures this time discrepancy.

(swheels) (sage-sh) dima@hilbert:swheels$ python p.py
python environment with cypari2:          5.116253892017994
with sympy loaded:                        5.216335576027632
(swheels) (sage-sh) dima@hilbert:swheels$ deactivate 
(sage-sh) dima@hilbert:swheels$ which python
/mnt/opt/Sage/sage-dev/local/var/lib/sage/venv-python3.10/bin/python
(sage-sh) dima@hilbert:swheels$ python p.py 
python environment with cypari2:          5.446458007965703
with sympy loaded:                        8.667378643993288
mkoeppe commented 1 year ago
comment:20

Replying to Dima Pasechnik:

if I start sage -sh, cd somewhere outside Sage directory, run python -m venv swheels, activate swheels, and try doing things there, pip does not look too healthy:

(swheels) (sage-sh) dima@hilbert:swheels$ pip cache purge
An error occurred during configuration: option format: invalid choice: 'columns' (choose from 'human', 'abspath')

Indeed. I've opened #34853 for this unrelated issue.

mkoeppe commented 1 year ago
comment:21

I'd suggest to look whether mpmath is somehow involved, see #25445

dimpase commented 1 year ago
comment:22

Replying to Matthias Köppe:

I'd suggest to look whether mpmath is somehow involved, see #25445

In case, it's exactly the same wheels, mpmath included, that are involved here. As to #25445, indeed, import mpmath has the same side effect on performance as import sympy.

So we can now test instead the following:

# p1.py
import timeit

prog="""
from cypari2 import Pari
pari = Pari(size={size})
pari("{cmd}")
"""

prog = prog.format(size=2**24, cmd="quadclassunit(1 - 2^100)")

print("python environment with cypari2:         ",
      timeit.timeit(prog, number=3))

setup="""
import mpmath
"""
print("with mpmath loaded:                       ",
      timeit.timeit(prog, setup=setup, number=3))
mkoeppe commented 1 year ago
comment:23

Try if the various MPMATH_... environment variables (https://github.com/mpmath/mpmath/blob/master/mpmath/libmp/backend.py#L66) have an effect

dimpase commented 1 year ago
comment:24

Oops, we're back to import sage.all here, as your link points few lines below at

if ('MPMATH_NOSAGE' not in os.environ and 'SAGE_ROOT' in os.environ or
        'MPMATH_SAGE' in os.environ):
    try:
        import sage.all
        import sage.libs.mpmath.utils as _sage_utils
        sage = sage.all
        sage_utils = _sage_utils
        BACKEND = 'sage'
        MPZ = sage.Integer
    except:
        pass

and skipping it indeed cures the discrepancy.

(sage-sh) dima@hilbert:swheels$ MPMATH_NOSAGE=yes python p.py
python environment with cypari2:          5.126071550010238
with sympy loaded:                        5.132636028982233
(sage-sh) dima@hilbert:swheels$ python p.py
python environment with cypari2:          5.0600332279573195
with sympy loaded:                        8.450407178024761

sorry for the bogus lead :-)

The question is now - can we import less than sage.all pieces?

mkoeppe commented 1 year ago
comment:25

Can you reproduce it if you replace import mpmath by import sage.libs.pari.all or import.sage.cpython.all etc?

dimpase commented 1 year ago
comment:26

Replying to Matthias Köppe:

Can you reproduce it if you replace import mpmath by import sage.libs.pari.all or import.sage.cpython.all etc?

Success with sage.libs.eclib.mat or sage.libs.eclib.homspace - they are still a serious chunk of sage.all, about 40%, by some measure, but not all of it:

$ ./sage --python
>>> import sys
>>> import sage.libs.eclib.mat
>>> len(sys.modules.keys())
673
>>> 

vs

$ ./sage --python
Python 3.10.8 (main, Nov 29 2022, 20:55:11) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> len(sys.modules.keys())
74
>>> import sage.all
>>> len(sys.modules.keys())
1712

for sage.libs.eclib.homspace one has 674 keys (but these for sage.libs.eclib.mat are a subset of these, so we can go with the latter)

dimpase commented 1 year ago
comment:27
>>> import sage.libs.eclib.mwrank
>>> len(sys.modules.keys())
346

and import sage.libs.eclib.mwrank does not reproduce. so we're down to ~330 modules to check :-)

dimpase commented 1 year ago
comment:28

eclib certainly uses libpari, so it potentially could be that loading it changes some pari defaults. But I don't know anything about this. Maybe John C. can comment...

JohnCremona commented 1 year ago
comment:29

Yes, eclib calls some functions in libpari and so must make sure that libpari has been initialised. This is done in the eclib function eclib_pari_init which detects whether it has already been initialised (by testing the value of pari variable "avma") and if necessary calls pari_init(). The first parameter to pari_init is taken from the environment variable PARI_SIZE and defaults to (currently) 10^8 if that variable is not set or has a value which cannot be converted to a long int.

The function eclib_pari_init() need never be called by users (and is not called by Sage) as it is called automatically when needed by one of two other functions which themseves use pari functions. I would expect that whenever Sage calls an eclib function which results in eclib's eclib_pari_init to be called, it will find that libpari has already been initialised (so avma will be nonzero) and then do nothing. But my expectation could be wrong in some situations?

See https://github.com/JohnCremona/eclib/blob/master/libsrc/parifact.cc

dimpase commented 1 year ago
comment:30

OK, avma is a global Pari variable. Declared in pari/paristio.h as

extern THREAD pari_sp avma;

Potentially every Sage module depending on Pari might want to check it and run Pari initialisation if it decided to, or perhaps even unconditionally, without checking. I don't know how it plays up in the presence of multithreading, either.

Suspects: lcalc, giac...

By the way, also import src.sage.libs.pari.convert_sage triggers the slowdown.

dimpase commented 1 year ago
comment:31

import sage.rings.integer is another reproducer, with "only" 346 sub-imports

lcalc and giac are OK.

mezzarobba commented 1 year ago
comment:32

Have you tried running both versions under perf top or perf record to see where the time goes?

dimpase commented 1 year ago
comment:33

Replying to Marc Mezzarobba:

Have you tried running both versions under perf top or perf record to see where the time goes?

I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.

dimpase commented 1 year ago
comment:34

Replying to Dima Pasechnik:

import sage.rings.integer is another reproducer, with "only" 346 sub-imports

it is reproducer on Fedora 34, but not on Gentoo. So that's getting even more bizarre.

lcalc and giac are OK.

on the other hand, they both unconditionally initialise Pari by calling pari_init_options. In view of avma being a thread-local global variable, I'm not sure about what's going on there, but perhaps we still should patch them as follows:

--- a/src/lcalc/Lcommandline.cc
+++ b/src/lcalc/Lcommandline.cc
@@ -31,6 +31,9 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 #include "config.h"
 #include "Lcommandline.h"
 #include "cmdline.h"
+#if HAVE_LIBPARI
+#include <pari/pari.h>
+#endif

 void pari_err_recover_nop(long errnum) {return;}

@@ -482,10 +485,11 @@ int main (int argc, char *argv[])

             t2=.5; //t2=.5 because of the GAMMA(s+1/2)
-
+            if (!avma) {
             pari_init_opts(400000000,2,INIT_DFTm); // the last option is to prevent
             //pari from giving its interrupt signal when its elliptic curve a_p
             //algorithm is called and interrupted with ctrl-c.
+            }

             coeff = new Double[3];
             //compute the conductor which is copied to coeff[1]

and for giac:

diff --git a/src/pari.cc b/src/pari.cc
index 76ce8e1..bdf1073 100644
--- a/src/pari.cc
+++ b/src/pari.cc
@@ -97,8 +97,10 @@ namespace giac {
   static void call_pari_init()
   {
     // do not initialize INIT_JMP so that PARI error do not exit
-    pari_init_opts(pari_mem_size,pari_maxprime,INIT_SIGm | INIT_DFTm);
-    paristack_setsize(pari_mem_size, (1<<30));
+    if (!avma) {
+       pari_init_opts(pari_mem_size,pari_maxprime,INIT_SIGm | INIT_DFTm);
+       paristack_setsize(pari_mem_size, (1<<30));
+    }
     // initialize variable ordering x,y,z
     gp_read_str("[x,y,z,t]");
   }
mezzarobba commented 1 year ago
comment:35

Replying to Dima Pasechnik:

Replying to Marc Mezzarobba:

Have you tried running both versions under perf top or perf record to see where the time goes?

I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.

I'm happy to help if I can, but at the moment I don't understand how to reproduce the slowdown. I was suggesting perf because I thought a systemwide profiler may help understand what is changing.

dimpase commented 1 year ago
comment:36

Replying to Marc Mezzarobba:

Replying to Dima Pasechnik:

Replying to Marc Mezzarobba:

Have you tried running both versions under perf top or perf record to see where the time goes?

I tried cProfile, mentioned it on sage-devel. Nothing to see, it's a slowdown of Pari itself that happens. You are welcome to take over.

I'm happy to help if I can, but at the moment I don't understand how to reproduce the slowdown. I was suggesting perf because I thought a systemwide profiler may help understand what is changing.

I don't know (or can't recall :-)) how to use perf. I imagine one would need a special build of libpari, and some magical commands to run...

mezzarobba commented 1 year ago
comment:37

Replying to Dima Pasechnik:

I don't know (or can't recall :-)) how to use perf. I imagine one would need a special build of libpari, and some magical commands to run...

Just run

perf record -- sage -python3 pari_timing.py

and then

perf report
mezzarobba commented 1 year ago
comment:38

Also, after upgrading my system and going back to python 3.10, I do see the slowdown. I'll check what happens on the 3.11 branch with the same system packages -- but this looks like it will require a full rebuild, so it may take some time.

mezzarobba commented 1 year ago
comment:39

The slowdown is still there after a full rebuild with python 3.11.

mezzarobba commented 1 year ago
comment:40

Replying to Dima Pasechnik:

On Gentoo with system python3.10.8, system-wide pari 2.15.1, running this p1.py

You wrote “system-wide pari 2.15.1”, but if I understand the spkg-configure right, pari 2.15.* is not allowed by sage's configure. Could it be that your tests are not all using the same version of pari?

dimpase commented 1 year ago
comment:41

Replying to Marc Mezzarobba:

Replying to Dima Pasechnik:

On Gentoo with system python3.10.8, system-wide pari 2.15.1, running this p1.py

You wrote “system-wide pari 2.15.1”, but if I understand the spkg-configure right, pari 2.15.* is not allowed by sage's configure. Could it be that your tests are not all using the same version of pari?

I am on https://github.com/sagemath/sage-prod/issues/34537 (update to Pari 2.15.1) in Gentoo. So there is an evidence that in this case you need to import more than sage.rings.integer to see a slowdown.

mezzarobba commented 1 year ago
comment:42

So I am now able to reproduce the issue. Comparing the profiles, the main difference seems to be a huge amount of time spent in the dynamic linker, more precisely in _dl_check_map_versions() (presumably called by dlopen()) and in update_get_addr(). Regarding the latter, the relevant code seems to be (glibc/elf/dl-tls.c):

void *
__tls_get_addr (GET_ADDR_ARGS)
{
  dtv_t *dtv = THREAD_DTV ();

  /* Update is needed if dtv[0].counter < the generation of the accessed
     module.  The global generation counter is used here as it is easier
     to check.  Synchronization for the relaxed MO access is guaranteed
     by user code, see CONCURRENCY NOTES in _dl_update_slotinfo.  */
  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
  if (__glibc_unlikely (dtv[0].counter != gen))
    return update_get_addr (GET_ADDR_PARAM);   // <----------

  void *p = dtv[GET_ADDR_MODULE].pointer.val;

  if (__glibc_unlikely (p == TLS_DTV_UNALLOCATED))
    return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);

  return (char *) p + GET_ADDR_OFFSET;
}

Afaict the slow path is never taken when running cypari under plain Python, while it is taken all the time when parts of Sage are loaded. But I don't know what to make of all that... (There is some information on TLS that looks like it might be relevant at https://chao-tic.github.io/blog/2018/12/25/tls)

dimpase commented 1 year ago
comment:43

I don't think your profiling makes much sense this way, as you are not excluding time taken by import, right? You can mitigate this by increasing number= to 30 or 50 - then you won't see linker kicking in too much.

Or I misunderstand, and you say that the linker is called by Pari?

nbruin commented 1 year ago
comment:44

When trying "python3 with cypari" and "python3 with sympy imported" with number=50 the effect is much more pronounced than with number=3 and it seems it's still stuff in the linker that explains the difference in the profile. perf diff perf_python3.data perf_sympy.data yields:

              +23.90%  ld-linux-x86-64.so.2                                        [.] _dl_update_slotinfo
    30.30%    -12.38%  libpari-gmp-tls.so.2.13.4                                   [.] 0x000000000009fc74
              +10.87%  ld-linux-x86-64.so.2                                        [.] update_get_addr
    10.31%     -4.60%  libpari-gmp-tls.so.2.13.4                                   [.] addii_sign
               +4.40%  ld-linux-x86-64.so.2                                        [.] __tls_get_addr_slow
     6.54%     -2.55%  libpari-gmp-tls.so.2.13.4                                   [.] xxgcduu
    11.61%     -2.51%  ld-linux-x86-64.so.2                                        [.] __tls_get_addr
     4.62%     -2.07%  libpari-gmp-tls.so.2.13.4                                   [.] shifti
     3.24%     -1.45%  libpari-gmp-tls.so.2.13.4                                   [.] muliispec
     3.11%     -1.16%  libpari-gmp-tls.so.2.13.4                                   [.] dvmdii
     2.43%     -1.08%  libpari-gmp-tls.so.2.13.4                                   [.] mulii
     2.09%     -1.01%  libgmp.so.10.4.1                                            [.] __gmpn_divrem_2
     1.91%     -0.88%  libpari-gmp-tls.so.2.13.4                                   [.] muluu
     1.75%     -0.84%  libgmp.so.10.4.1                                            [.] __gmpn_addmul_1_x86_64
     1.70%     -0.62%  libpari-gmp-tls.so.2.13.4                                   [.] abscmpii
     1.24%     -0.59%  libgmp.so.10.4.1                                            [.] __gmpn_invert_limb
     1.46%     -0.58%  libgmp.so.10.4.1                                            [.] __gmpn_tdiv_qr

where, as far as I can see, __tls_get_addr_slow doesn't show up at all for python3. So as far as I can see, some switch gets thrown that forces the linker to look up a symbol again and again, in a slower way. I would have expected dynamic linking to happen once in a run, and then getting out of the way. That doesn't seem to be happening here.

dimpase commented 1 year ago
comment:45

I would have expected dynamic linking to happen once in a run, and then getting out of the way. That doesn't seem to be happening here.

Indeed, this very, very weird. I've never heard about such cases. But it seems we are now debugging a rather eigenzinnig, pardon my Dutch, Pari's use of threads?

nbruin commented 1 year ago
comment:46

I'm also seeing the slow-down with just import mpmath (which is also something sympy does). So perhaps it's something mpmath does with its configuration of gmp? Note that mpfr, which is used by mpmath, also makes use of gmp, so that could be another source of conflicts. Plus: mpmath has some sage-specific backend stuff, so importing a sage-aware mpmath could have different effects than one that isn't.

One hypothesis would be that the gmp library gets linked twice: once by python and once by libpari, and perhaps under different threading conditions; leading to an error in initialization of the thread-local storage of libgmp; causing a slow symbol resolution in libgmp every time ...

I did not see a slowdown with import gmpy2, so it's something mpmath does. It may well be something in sage.libs.mpmath.ext_main, which a sage-aware mpmath imports.

In fact, ext_libmp.pyx:11 indicates that sage-aware mpmath would call some initialization that has to do with gmp at import-time. So if libpari messes with threads afterwards and links gmp, there may be some room for things to go sideways.

dimpase commented 1 year ago
comment:47

Replying to Nils Bruin:

I'm also seeing the slow-down with just import mpmath (which is also something sympy does). So perhaps it's something mpmath does with its configuration of gmp?

Sage-aware mpmath imports sage.all (see comment:24), so in this sense it's a moot point. But yes, there is quite a bit of dodgy things going on in mpmath.

nbruin commented 1 year ago
comment:48

Replying to Dima Pasechnik:

Sage-aware mpmath imports sage.all (see comment:24), so in this sense it's a moot point. But yes, there is quite a bit of dodgy things going on in mpmath.

Apologies for replicating that already-established knowledge. Following up on comment:23: when using the correct syntax to set MPMATH_NOSAGE=1, the import of mpmath has no adverse effect. So it's something else in sage.all that causes the problem (edited in response to comment:49).

mezzarobba commented 1 year ago
comment:49

Replying to Nils Bruin:

Following up on comment:23: as far as I can see, MPMATH_NOSAGE, MPMATH_NOGMPY have no influence on the result.

Are you sure?

(sage-sh) marc@tramontane:tmp$ python3 pari_timing.py
python environment with cypari2:           3.799615835014265
with mpmath loaded:   6.606633314979263
(sage-sh) marc@tramontane:tmp$ export MPMATH_NOSAGE=1
(sage-sh) marc@tramontane:tmp$ python3 pari_timing.py
python environment with cypari2:           3.996781985013513
with mpmath loaded:   4.207063634006772
dimpase commented 1 year ago
comment:50

to quote GCC people on the topic __tls_get_addr https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81501#c6

We recently upgraded our toolchain from GCC9 to GCC11, and we're seeing __tls_get_addr
take up to 10% of total runtime under some workloads, where it was 1-2% before.

indeed, this means that if __tls_get_addr gets slowed down, the same code might suddenly get visibly slower.

dimpase commented 1 year ago
comment:51

The story of __tls_get_addr_slow in glibc (https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/tls_get_addr.S) starts from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

but I can't make much sense of it. Something to do with misaligned stack.