pmgbergen / porepy

Python Simulation Tool for Fractured and Deformable Porous Media
GNU General Public License v3.0
250 stars 88 forks source link

Workflow for running slow tests #856

Closed jhabriel closed 1 year ago

jhabriel commented 1 year ago

Background

Currently, tests are run on every push and at merging time through the Pytest workflow (https://github.com/pmgbergen/porepy/actions/workflows/run-pytest.yml). Naturally, slow tests (e.g, integration tests for three-dimensional models and convergence tests) are not part of this workflow because they take too much time. To be precise, they are skipped since they are explicitly marked with a decorator of the type @pytest.mark.skip or @pytest.mark.skipif.

The need for a workflow that runs all tests

At the moment, there is no workflow that checks if the skipped tests are actually passing. This means that the skipped tests are currently not in used, which is obviously suboptimal from a testing framework standpoint.

Proposed changes

Create a new workflow named Pytest-slow, that run all tests on a weekly basis (specifically, on Sundays at 00:00 GMT+2) and also manually (see https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow and https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch). The latter option is important, since new slow tests must pass before merging a PR. It will therefore be the responsibility of the assignee and the reviewers to verify that Pytest-slow completes without errors.

Marking tests

We need a system to effectively mark tests as slow (in the future, there might be a need for other types of classifications). There is no unique way to achieve this (in fact, there is at least three options, see: https://jwodder.github.io/kbits/posts/pytest-mark-off/). However, it seems that the best option is to create a custom mark (e.g., slow) and use the pytest_collection_modifyitems hook to add an additional pytest.mark.skip. This is done in a conftest.py file.

How would this work in practice?

Mark tests as slow with the decorator @pytest.mark.slow. When one runs pytest, all tests with the assigned decorator will be skipped. If one wants to run all tests, then is simply a matter of adding the --run-slow flag, i.e., pytest --run-slow.

Implementational steps

keileg commented 1 year ago

Thanks for the nice writeup, and for trying out task lists inside an issue.

Some thoughts:

It may also be wise to ask Ivar for opinions after the two of us are done discussing, but before implementing something.

jhabriel commented 1 year ago

Thanks for the input. Super valid questions, I will try to answer them below:

Of course. If you are satisfied with the answers, feel free to tag him on this issue to have his input.

jhabriel commented 1 year ago

@keileg, @OmarDuran and @IvarStefansson:

I would like to have your opinion on what constitutes a slow test, i.e., what would be the criterion to mark a test as slow based on the amount of time it takes to run. Please, leave your thumbs up in one the options from below.

Also, if you have a concrete suggestion for the bullet point number 4 of https://github.com/pmgbergen/porepy/issues/856#issuecomment-1505369611, please let me know.

Thanks in advance.

IvarStefansson commented 1 year ago

I have two comments: 1) I think run time forms only part of the basis for choosing which tests to skip. Their importance to the test suite is also relevant. The importance criterion includes which part of the code the test covers, the suite's coverage of that part of the code and possibly more. 2) I suggest a nuance toEirik's check box: It should not apply "trivial" PRs (only style, documentation, very basic maintanance etc.).

jhabriel commented 1 year ago

I agree. Importance should definitely be considered too. Meaning, if there is a very important test that qualifies as slow but is crucial for the test suite, I assume we will keep it. However, I see those as outliers rather than the norm.

Now, there might also be non-crucial fast tests that we might want to skip, as you say. My suggestion will therefore be to create a different marker for these types of tests, e.g., non_crucial.

That being said, we still need a criterion to classify a test as slow. And if the non_crucial marker is introduced, also a criterion to classify them as such.

IvarStefansson commented 1 year ago

I would hesitate to introduce too many labels for now. If we can get away with a single flag to indicate whether a test is run always or only on "full runs", that would probably be easier. Introducing more categories later is of course possible if the need arises.

jhabriel commented 1 year ago

I agree that introducing too many categories is a no-go. However, there are a few reasons why you would want to categorize skipped tests:

  1. I believe it is important to know why a test is being skipped, not only that is being skipped.
  2. Categorizing skipped tests unlocks the possibility to only run those belonging to that category, without having to run the full test suite.
  3. For people outside of the core developer group, it will be more transparent to know when one has/can skip a test.

To your point, I think there should be strict rules for categorizing tests.

jhabriel commented 1 year ago

Solved via #863.