smarie / python-pytest-harvest

Store data created during your `pytest` tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes.
https://smarie.github.io/python-pytest-harvest/
BSD 3-Clause "New" or "Revised" License
61 stars 8 forks source link

pytest-harvest does not work with python-xdist #32

Closed larsks closed 4 years ago

larsks commented 4 years ago

pytest-harvest is unable to work in an environment with parallel tests. Any tests that make use of the harvest results fixtures (module_results_df, session_results_df, etc) are simply scheduled in parallel with other tests, and will in general complete before the other tests have finished running.

Consider the following example:

import pytest
import time

@pytest.mark.parametrize('instance', range(5))
def test_example(instance):
    time.sleep(5)
    assert True

def test_results(session_results_df):
    with open('results.csv', 'w') as fd:
        fd.write(session_results_df.to_csv())

If we run this in an environment with pytest-xdist like this:

pytest -v -n8 test_harvest.py

We see that all the tests get scheduled at the same time:

test_harvest.py::test_example[1] 
test_harvest.py::test_example[3] 
test_harvest.py::test_example[0] 
test_harvest.py::test_example[2] 
test_harvest.py::test_example[4] 
test_harvest.py::test_results 
[gw5] [ 16%] PASSED test_harvest.py::test_results 
[gw0] [ 33%] PASSED test_harvest.py::test_example[0] 
[gw3] [ 50%] PASSED test_harvest.py::test_example[3] 
[gw4] [ 66%] PASSED test_harvest.py::test_example[4] 
[gw2] [ 83%] PASSED test_harvest.py::test_example[2] 
[gw1] [100%] PASSED test_harvest.py::test_example[1] 

And at the end, results.csv contains:

test_id
smarie commented 4 years ago

Thanks for the detailed report ! I am very happy to finally see this happening (I can now close #1 :) )

I guess that I'll need some help from @nicoddemus to understand how to proceed. There seem to be two topics here:

I have no experience at all with x-dist, that's why I would happily review proposals or ideas.

larsks commented 4 years ago

While I am no pytest expert, it sounds like for what you're doing with harvest that it should be a reporting plugin (like pytest-html or pytest-json-report). The report plugins run after all the tests are complete.

nicoddemus commented 4 years ago

Hi,

I did not look at the implementation of the results containing fixtures, but to answer your questions:

ensuring that some tests are run after all the other are completed (the test_results in your example). I am sure that there is something in pytest-xdist to do this properly.

Unfortunately there's no way to do this properly, but we have WIP in that area (https://github.com/pytest-dev/pytest-xdist/pull/500).

finding a way for the results-containing fixtures to be shared across parallel nodes. Maybe this is already automatic with x-dist, but I might be dreaming :)

If by "shared" you mean memory-shared, unfortunately this is not possible with pytest-xdist (and won't be anytime soon). You can cook something up using some kind of shared file though, see an example here: https://github.com/pytest-dev/pytest-xdist#making-session-scoped-fixtures-execute-only-once

Hope this helps!

smarie commented 4 years ago

@larsks : indeed, pytest-harvest uses the pytest-report hook to collect the basic information, like all reporting plugins. Only the additional info is stored in a session-shared fixture ("store" / "result_bag")

Actually pytest-harvest does not have any prerequisite about where to put the code to read the final synthesis. It can be in the last test, in a session-scoped fixture teardown, in the session finish hook (like JSON report plugin: see here)... I show several alternatives in this doc advanced usage section. The only thing needed is access to the session object (also available through request.session). So you can do the following in your conftest.py:

from pytest_harvest import get_session_synthesis_dct

def pytest_sessionfinish(session):
    session_results_dct = get_session_synthesis_dct(session)
    ...

However I admit that I do not provide handy higher-level methods to reach the ease of use of what is available in the fixtures. I will do this now, it should be straightforward.

Notes:


Thanks @nicoddemus for the quick answer ! Well concerning the order, if x-dist respects the pytest_sessionfinish hook then that should be a sufficient first step - see above. Of course we'll follow closely https://github.com/pytest-dev/pytest-xdist/pull/500 because that would be even better! Thanks for the link on sharing data. Not sure we'll need this for the basic usage: we just need all report items to be available at session finish. However as soon as people will wish to use the results_bag fixture to store additional info, I'll need the store fixture to be unique (or to retrieve all store fixture instances at session finish so as to concatenate them). I'll dig into this.

nicoddemus commented 4 years ago

Well concerning the order, if x-dist respects the pytest_sessionfinish hook then that should be a sufficient first step

It does, but keep in mind pytest_sessionfinish will be called multiple times: once for the master process and once for each worker. You can verify which is which by checking the XDIST_WORKER_ID environment variable. Also pytest_sessionfinish is guarateed to be called last for master.

smarie commented 4 years ago

Thanks. Unfortunately that does not work for me: os.environ.get('XDIST_WORKER_ID', None) always return None. However this works fine:

try:
    pytest_worker_id = session.config.slaveinput['slaveid']
except AttributeError:
    pytest_worker_id = 'master'

Now the only remaining issue I see is accessing to the session-scoped fixture_store. I'll dig into this.

nicoddemus commented 4 years ago

Oh right my bad, that env var is only set in the workers.

smarie commented 4 years ago

Last question for you maybe: is it always the case that 'master' does not run any test items on its own when using pytest-xdist ? That could ease the example code I wish to include in the docs

nicoddemus commented 4 years ago

With xdist enabled (passing -n > 1), all tests run only on workers, the master process doesn't even collect them.

smarie commented 4 years ago

Is there an official way to know if xdist is enabled ? That could also help clarifying the script. Indeed session.config.slaveinput['slaveid'] raises both an error when the node is the xdist master node AND when xdist is not enabled at all. I would like to make a difference between those two cases

smarie commented 4 years ago

ticket is closed but in case you have an answer to my last question @nicoddemus do not hesitate :) I also opened a SO question for this: https://stackoverflow.com/questions/60212389/how-can-i-know-from-inside-a-pytest-hook-if-pytest-xdist-is-enabled?noredirect=1#comment106504641_60212389

smarie commented 4 years ago

By the way this is fixed in 1.8.0. @larsks let me know if this conftest.py example works for you.

nicoddemus commented 4 years ago

Is there an official way to know if xdist is enabled ?

hasattr(session.config, "workerinput") is what is used in general, but this does not distinguish between "xdist-disabled" and "xdist-enabled-but-on-master" though.and I agree this is not optimal unfortunately.

larsks commented 4 years ago

@smarie it's not pretty but it seems to work, thanks.

smarie commented 4 years ago

Thanks @nicoddemus . Ok I'll leave the example as it is now then.

@larsks thanks for the feedback ! If you have any idea on how to change the user experience given the constraint imposed by pytest-xdist (no possibility to have a particular test run last, so the synthesis collection should run in a hook), let me know !

smarie commented 4 years ago

@larsks I opened a dedicated issue concerning a potential simplification of the process : #36