pytest-dev / pytest

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
https://pytest.org
MIT License
11.87k stars 2.64k forks source link

pytest collection appears to stall/slow down/jam up when some third-party libraries are used; add function to ignore specific modules #12722

Open geofire opened 3 weeks ago

geofire commented 3 weeks ago
pytest 8.3.2
Python 3.11
Windows 10

Hi all!

I was tossing up whether this should be a bug report or a feature request, as I wasn't able to work out whether the following is expected behaviour in the documention. It took a good day of solid troubleshooting to figure this one out, which I have found a workaround, so it's not a showstopper though it was difficult and very confusing to troubleshoot.

What's the problem?

pytest will, perhaps by design?, scan through and 'collect' (some, all?) third-party libraries used in x function, when x function iself is imported into a test to run.

This gives the impression that pytest:

Current behaviour (as of pytest 8.3.2)

Note: I've used the awesome pytest-richtrace library (in --verbose mode) to help me figure this issue out, as there didn't appear to be a similar function in pytest to make pytests collection activities verbose.

The library I can reliably reproduce this issue with is arcgis. https://pypi.org/project/arcgis/

arcgis is Esri's ArcGIS API for Python, allowing Python code to interact with Esri's Enterprise and Online geospatial systems without needing to write a mess of boilerplate REST API code. arcgis is a package I don't maintain, and don't need to test directly.

The same behaviour is present in both PyCharm and by manually invoking pytest with python -m via command line.

Example

# app.py
import os
from arcgis import GIS  # Connection to an server Portal instance

def connect():
    portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
    return portal

def connection_url(domain, context):
    if domain == 'arcgis.com'
        return None  # API defaults to arcgis.com if None used as parameter
    else:
        return(f"https://{domain}/{context}")
# common.py
def create_wigwam():
        # Do stuff here.
    return True
Tests
# tests/test_app.py
from app import connection_url

def test_connection_url():
    pass
# tests/test_common.py
# another random test unrelated to app.py in the same directory
from common import create_wigwam

def test_create_wigwam():
    assert True

Note that:

I have also explicitly excluded venv and site-packages as directories in pytest.ini.

The code above, as it is written right now, will see pytest traverse the arcgis package within the virtual environment that runs this code. According to pytest-richtrace, pytest doesn't appear to collect anything in that package or the venv directory. pytest seems to ignore the os package.

Using pytest-richtrace I saw the following behaviour:

hook: pytest_collection
    session: <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=0>
...
hook: pytest_collectstart          tests/test_app.py
INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.   # This is calling into the virtual environment.
INFO:numexpr.utils:NumExpr defaulting to 8 threads. 
    # Stall here after the above line is printed to console for a period of time, in my case at least 20 seconds, no other feedback is given.
hook: pytest_itemcollected         tests/test_app.py::test_connection_url
hook: pytest_collection_modifyitems
hook: pytest_collection_finish

Simply commenting out the import line in test_app.py --

from app import connection_url

-- prevents pytest traversing into arcgis to collect. All other tests complete pretty much instantaneously.

I haven't checked whether arcgis has any tests though the collection process certainly doesn't pick any up.

Describe the solution you'd like

Workaround solution

Using MagicMock in unittest.mock allows pytest to traverse into app.py without also traversing into arcgis, stopping the stalling issue without having to comment out code or refactor unnecessarily:

# test_app.py
from unittest.mock import MagicMock
sys.modules['arcgis'] = MagicMock()

# No other mocking code is needed, as this MagicMock completely substitutes arcgis when under test.

Many thanks!

RonnyPfannschmidt commented 3 weeks ago

Based on the provided information it seems like importing the Library is unreasonably expensive

Please validate if importing lazyly removes the stall

geofire commented 3 weeks ago

Hi @RonnyPfannschmidt,

Lazy loading certainly appears to bypass the stall (with mocking code commented out in the test):

# app.py
import os
# from arcgis import GIS  # Moved from here to connect()

def connect():
    from arcgis import GIS  # Lazily load GIS from arcgis
    portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
    return portal

def connection_url(domain, context):
    if domain == 'arcgis.com'
        return None  # API defaults to arcgis.com if None used as parameter
    else:
        return(f"https://{domain}/{context}")

pytest-richtrace doesn't show pytest traversing into arcgis.