Open nsheff opened 3 months ago
You can view the output with a cool tool called tuna
. I ended up running the following to profile the import time:
python -X importtime -c "from looper.__main__ import main; main()" 2> looper.log
I just did this over at geniml
, and remembered this issue so I figured while I was on a roll... Also I was struggling with his when running looper
recently. Here is the tuna
output
seems like pandas
(in peppy
) is a big issue.
Here is the log output: looper.log if someone wanted to download it and run tuna
themselves.
I cannot reproduce those slow import times. I get ~0.4-0.56 seconds during import. I tested a fresh venv as well.
Begun some work towards replacing pandas with polars and doing performance testing.
peppy_branch: https://github.com/pepkit/peppy/tree/dev_replace_pandas_with_polars
importing Peppy, Pandas, Looper 50 times, we see a mean and std in miliseconds for import time of:
Using Pandas n=50
──────────────────────────────────── Pandas ────────────────────────────────────
mean 188.684421
std 3.665686
──────────────────────────────────── Peppy ─────────────────────────────────────
mean 244.675341
std 20.345653
──────────────────────────────────── Looper ────────────────────────────────────
mean 470.185256
std 27.921771
Replacing pandas with polars in Peppy: n=50
──────────────────────────────────── Polars ────────────────────────────────────
mean 51.336722
std 11.519378
──────────────────────────────────── Peppy ─────────────────────────────────────
mean 185.085058
std 42.192459
Note, I did not test Looper with the polars replacement yet because I realized that Looper will pull in pandas from Peppy, Ubiquerg, and Pipestat so it was becoming difficult to pull out pandas completely.
I'm unsatisified with how long it takes the
looper
CLI to run. I guess it's because looper imports a bunch of heavy stuff, likepandas
,peppy
,sqlalchemy
(via pephubclient), etc.A lot of these aren't necessary.
I suggest we see if it's possible to import some of the heaviest things only as needed, instead of at the top of the file as is typically done.
You can profile import time like this: