Allow hooks to retry a single test case multiple times with fresh fixtures

bcmills commented 1 week ago

What's the problem this feature will solve?

Tests of APIs that rely on timer / timeout behaviors currently have to choose one (or both!) of {slow, flaky}:

If the test uses a short duration for the timeout, then sometimes — due to scheduling delays on the host OS, for example — something that needs to happen before the timeout fires doesn't happen, and the test has a flaky failure.
If the test uses a long duration for the timeout, then it ends up needing to sleep for some multiple of that long duration, and the test runs reliably but is extremely slow — say, 10s for a test function that could normally complete in <10ms.

I would like not to have to choose between those two: I want the test to run quickly, but to be retried automatically if the timeout turns out to be too short.

Describe the solution you'd like

Ideally, I would like implement a pytest fixture that takes on the current timeout value. Then, each other test fixture that depends on it can configure its own objects configured based on that timeout, and the test is run with those fixtures. If it passes, the test passes overall and is done. If it fails, the fixtures are torn down, a new (longer) timeout is selected, and a new set of fixtures are recreated with the new timeout value.

This process should be iterated until either the test passes, or the selected timeout exceeds a configured maximum.

In particular:

Warnings and errors for runs on short timeouts should not be logged.
Different timeout values should not be considered as separate test case parameters, since they fundamentally represent only one underlying test case: “run with an appropriate dynamic timeout”.
Test fixtures must be recreated with each new timeout value, since a fixture may create an object that uses the timeout internally. (For example: a connection timeout on a networking library; a batch-delay timeout on an asynchronous-batching mechanism; a sleep timeout on a polling-based API.)

Examples of this pattern (in Go rather than Python) can be found in the Go project's net package: https://github.com/search?q=repo%3Agolang%2Fgo+%2FrunTimeSensitiveTest%5C%28%2F&type=code

Unfortunately, I don't see a way to run a pytest test a variable number of times with fresh fixtures:

pytest-retry relies on pytest implementation details to reset between runs in a makereport hook.
pytest-repeat treats each attempt number as its own separate parameter value.

Alternative Solutions

One alternative is to move all objects that depend on the configured timeout outside of pytest fixtures and into the test function itself. That works, but it severely diminishes the value of pytest fixtures for the affected test.

Another alternative is to design all objects in the hierarchy so that their timeouts can be reconfigured on-the-fly, and use a single set of fixtures for all attempts. Unfortunately, if I use any third-party libraries that may force me to rely on implementation details to monkey-patch the timeout configuration, and even that isn't always possible.

The-Compiler commented 1 week ago

If the test uses a long duration for the timeout, then it ends up needing to sleep for some multiple of that long duration, and the test runs reliably but is extremely slow — say, 10s for a test function that could normally complete in <10ms.

You lost me there. Why does it need to sleep? That's not how timeouts usually work, no? I don't see the difference between running a test once with a 10s timeout, vs. running it with a 1s + 2s + 3s + 4s timeout. For at least your "a connection timeout on a networking library" example, the test will finish as soon as the server answers, and I'd argue that for many other cases the first thing to attempt is to make it work that way as well (e.g. with a polling based API, you might still want to poll all 0.1s or something, but time out after, say, 50 attempts).

FWIW, there's pytest-rerunfailures that recreates fixtures, and seems to have a way to access the .execution_count on the test item.

There's various open issues around exposing an API around fixtures (#12630, #12376, ...), and what you describe in particular sounds a lot like a duplicate of #12596 to me.

bcmills commented 1 week ago

You lost me there. Why does it need to sleep? That's not how timeouts usually work, no?

This is for testing the cases where a call internal to the test intentionally does time out, not the case where the test itself exceeds its intended running time.

For at least your "a connection timeout on a networking library" example, the test will finish as soon as the server answers

No, you have it backwards. This is for the cases where we want the server not to answer in time.

Failure modes also need to be tested!

bcmills commented 1 week ago

FWIW, there's pytest-rerunfailures that recreates fixtures, and seems to have a way to access the .execution_count on the test item.

Looks like that one also relies on undocumented implementation details: https://github.com/pytest-dev/pytest-rerunfailures/blob/a53b9344c0d7a491a3cc53d91c7319696651d21b/src/pytest_rerunfailures.py#L499

bcmills commented 1 week ago

what you describe in particular sounds a lot like a duplicate of #12596 to me.

Yep, that does seem similar! The key difference there, I think, is that they want to run the test until it fails, whereas I want to run it until it succeeds and discard the failure logs — but those parts might already be possible if the fixture-reset problem is addressed.

pytest-dev / pytest