graspologic-org / graspologic

Python package for graph statistics
https://graspologic-org.github.io/graspologic/
MIT License
650 stars 134 forks source link

[BUG] graspologic takes 33 seconds to import #1035

Open loftusa opened 1 year ago

loftusa commented 1 year ago

Problem

Graspologic is taking an extremely long time to import for me. This is after a fresh pip install --upgrade graspologic. (Also had to pip install --upgrade numba and pip install --upgrade numpy to get it to import)

I timed it and it looks like it takes around 33 seconds, and importing it also gives some strange umap numba warning.

Screenshot 2023-05-20 at 9 19 02 PM

Example Code

Please see How to create a Minimal, Reproducible example for some guidance on creating the best possible example of the problem

from time import time
start = time()
import graspologic
end = time()

print(end - start)

Full Traceback

/usr/local/lib/python3.9/site-packages/umap_learn-0.5.3-py3.9.egg/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/usr/local/lib/python3.9/site-packages/umap_learn-0.5.3-py3.9.egg/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/usr/local/lib/python3.9/site-packages/umap_learn-0.5.3-py3.9.egg/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/usr/local/lib/python3.9/site-packages/umap_learn-0.5.3-py3.9.egg/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/usr/local/lib/python3.9/site-packages/graspologic/models/edge_swaps.py:215: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  _edge_swap_numba = nb.jit(_edge_swap)

Your Environment

Additional Details

This is in the graphstatsbook docker container with 7 cpus allocated and about 2/3 of my RAM on a 2022 macbook air with m2 chip.

bdpedigo commented 1 year ago

I don't plan on working on this, but if anyone wants to speed things up, go for it.

I'd also just note that importing a specific function or class is usually pretty quick

loftusa commented 1 year ago

I did some light profiling on this with python -X importtime -c 'import graspologic -- here's what came up. import_times.txt

loftusa commented 7 months ago

@bdpedigo I looked at this a bit more just now using tuna. here's the import profile for graspologic:

Screenshot 2023-12-07 at 9 18 44 AM

appears to be mainly the umap import in graspologic.layouts.auto and ot in graspologic.align.seedless_procrustes

bdpedigo commented 7 months ago

that's interesting! and a cool tool/visualization

im open to discussing proposed fixes, i just dont really know what could be done here, since those other libraries are out of our controll

i can tell you that i dont think we use anything under ot.backend.tensorflow ot ot.backend.torch... so if there's some way to turn off those imports perhaps that could be a big save?

bdpedigo commented 7 months ago

i wonder why the load time is so much shorter for tuna than you, though?

loftusa commented 7 months ago

i wonder why the load time is so much shorter for tuna than you, though?

no clue, I noticed that too, how long does it take for you?

that's interesting! and a cool tool/visualization

im open to discussing proposed fixes, i just dont really know what could be done here, since those other libraries are out of our controll

i can tell you that i dont think we use anything under ot.backend.tensorflow ot ot.backend.torch... so if there's some way to turn off those imports perhaps that could be a big save?

throw imports inside of functions maybe? makes those functions take "longer" to run, but shorter for anybody who just wants to import the package

bdpedigo commented 7 months ago

https://github.com/PythonOT/POT/issues/516 i wonder to what extent your issue is related to this? what version of POT are you on? it sounds like the root cause is tensorflow, do you have tensorflow installed in this environment?

bdpedigo commented 7 months ago

i guess another question - is there a reason you are needing to import all of graspologic, if you're saying you dont want some of these functions? might be much faster to just import the function(s) you need

daxpryce commented 5 months ago

i wonder why the load time is so much shorter for tuna than you, though?

no clue, I noticed that too, how long does it take for you?

that's interesting! and a cool tool/visualization im open to discussing proposed fixes, i just dont really know what could be done here, since those other libraries are out of our controll i can tell you that i dont think we use anything under ot.backend.tensorflow ot ot.backend.torch... so if there's some way to turn off those imports perhaps that could be a big save?

throw imports inside of functions maybe? makes those functions take "longer" to run, but shorter for anybody who just wants to import the package

does this import stick around? are you paying the cost only the first time? if so, this seems totally reasonable to me, but if you add 33 seconds every time you try to save your graph layout, it's going to be a bit wonky. doesn't mean there won't be other ways to fix it, just that this specific one may not work.