Open thomasaarholt opened 4 years ago
Considering that importing something only does the work once (modules are basically singletons) you might want to use pedantic mode, see: https://pytest-benchmark.readthedocs.io/en/stable/pedantic.html
Also you might want to have an assertion like this before the benchmark:
assert 'numpy' not in sys.modules
That will ensure that the module was not previously imported and the benchmark result will be useful.
Thanks! That working very nicely! That was a good trick to check whether the module is present in sys.modules. Unfortunately, I'm running the problem that for our given module, pytest has already imported it.
I did a bit of reading, and according to the pytest import mechanisms, one of the things pytest does before running our test suite is to add the module to be tested to sys.path
and then import it.
The equivalent for the numpy example above is the simplified directory structure:
numpy
numpy/tests
numpy/tests/test_benchmark.py
If test_benchmark.py
contains the following, then numpy
is already imported, and the test fails.
import sys
def something():
import numpy
def test_my_stuff(benchmark):
assert 'numpy' not in sys.modules
benchmark.pedantic(something)
If I stick test_benchmark.py
in a folder that is not a child of numpy
it runs fine.
Do you happen to have any suggestions? I'm not sure if there's a way to run a particular set of tests (i.e. test_benchmark_imports.py
) outside of the main tests (without importing the parent module).
So you're not actually testing import time of numpy
then yes? I guess you need a "tests outside" project layout. And then tell pytest to only collect your test directory (instead of your package). See https://blog.ionelmc.ro/2014/05/25/python-packaging/
(wow, that was quick)
That's right, I'm testing hyperspy. When most users import hyperspy (import hyperspy.api
), it imports a large number of packages (numpy, scipy, scikit(s), tifffile and many more) which means it can take between 5 and 30 seconds to import. We're looking into speeding it up, which is why I'm looking at benchmarking. It would be a good practice for a lot of our functionality, but I realied it would be great to check import speed as well.
Hyperspy happens to rely on numpy, which is why I used that earlier. Since pytest imports hyperspy and hyperspy imports numpy, numpy is already in sys.modules
. I had a look at using importlib.reload
, but that only reloads hyperspy
specifically, not the packages it relies on (like numpy), and we're interested in making sure we don't add more such imports.
I think I understood what I need to do from your suggested link! I'll try placing the benchmark tests in the directory parent to the module.
So you're not actually testing import time of
numpy
then yes?
Just realised the alternate interpretation of this. Yes, it wasn't actually testing the import time, it was basically skipping the import because it was already in sys.modules
.
It works! Thanks!
Actually, it works sorta.
With benchmark.pedantic(import_func, iterations=1, rounds=1)
it works, but if I increase the iterations or rounds, I still have the problem that the module (i.e. numpy) already exists in sys.modules
.
This is my test file:
import sys
def import_numpy():
assert 'numpy' not in sys.modules
import numpy
def test1(benchmark):
benchmark.pedantic(import_numpy, iterations=1, rounds=1)
I can get around this by being fine with rounds=iterations=1
and testing other imports from hyperspy in different files. I'd like to test three different imports:
import hyperspy.api
import hyperspy.api_nogui
from hyperspy.signals import Signal1D
Unless you have any suggestions, I'll go ahead with a single iteration
/round
and keep the three imports in separate files.
Dang, I just realised that using multiple files does not work either. If I create several numbered test_import_numpy1.py
(2, 3, etc.) benchmarks, only the first one is correct. Since numpy is already imported, the next benchmarks go too quickly.
The reason this isn't great for our case, is that I would ideally like to benchmark three (or more) imports from the hyperspy package without any of the previous imports affecting the next one - essentially "resetting" sys.modules
each time.
Sorry for clogging this issue with so much text. I've spent this evening getting a feel for how pytest-benchmark works, and (despite this issue) I am really enjoying learning about it!
You could remove the modules from sys.modules
before running the benchmark. Tho that will only clear the module references, not the filesystem caches or whatever lower-level might be in play for the imports.
In airspeed velocity and Python 3.7+ I was using the following code:
cmd = [sys.executable, "-X", "importtime", "-c", "import pint"]
p = subprocess.run(cmd, stderr=subprocess.PIPE)
line = p.stderr.splitlines()[-1]
field = line.split(b"|")[-2].strip()
total = int(field) # microseconds
return total
which according to some docs
on py37+ the "-X importtime" usage gives us a more precise measurement of the import time we actually care about, without the subprocess or interpreter overhead
The problem to migrate this is that AFAIK there is no way to inform python-benchmark a time (in seconds) that was measured differently.
It would be great to have a convenient way to time running some operation in a subprocess. There are other reasons for this than measuring import time such as timing operations that make use of global in-memory caches (e.g. lru_cache
). In my usage I really need something that can:
Even better would be to time a sequence of operations in each process so the subprocess is used to benchmark several interdependent steps one after another.
Hello!
I'm interested in using pytest-benchmark to record the import speed of a python package (hyperspy) that I'm contributing to. It has a data processing api that takes several seconds to import.
To test, I tried benchmarking numpy with pytest-benchmark in the following way, but it reports function times in the nanosecond range when manual timing reports millisecond times.
Am I doing something wrong, or is benchmarking imports not (yet?) supported?