Request for retrying flaky fixtures

adamtheturtle commented 8 months ago

Thank you for this project.

I see in the documentation:

Currently, failing test fixtures are not retried. In the future, flaky test setup may be retried, although given the undesirability of flaky tests in general, flaky setup should be avoided at all costs.

I'd like to explain my use case, in case it changes your opinion about whether this project should support retrying test fixtures.

I am building a verified fake of Vuforia Web Services (VWS). My test suite runs partially against VWS, a third party service. I sometimes hit rate limit errors. In my tests, I use pytest-retry to retry when I hit those rate limit errors.

In my test setup fixture, I empty the VWS target database, so I have a clean slate for the test. In this emptying step (list targets, delete each one) I make requests to VWS API. These requests sometimes hit rate limit errors. I have custom retry logic to retry on those rate limit errors. I would like to use pytest-retry to retry on those errors in the fixture, too.

str0zzapreti commented 8 months ago

Thanks for the feedback! I definitely feel your pain on dealing with setup/teardown of 3rd party services, and sorry for the delayed response (I just started a new job!). The scenario you described definitely sounds like a good candidate for a verified fake, I guess my question is why would you still need to interact with the actual VWS API. I could just be missing some context here, are you doing continuous verification to ensure your mock doesn't have behavior drift? Or are you running most of your tests against the mock and then a small subset against the real service for true end-to-end validation (with those being the ones that require retries)? Are the rate limits poorly defined and restrictive such that you can't deterministically rate limit the requests from your end to avoid them? Basically, my first thought is that fixture retries seem like a rather imprecise and (ironically) flaky fix for your issue, but that could just be my relative ignorance about your situation.

adamtheturtle commented 8 months ago

sorry for the delayed response

I do not consider two days a delay 😄

I just started a new job!

Mazal tov!

are you doing continuous verification to ensure your mock doesn't have behavior drift?

That's exactly what I'm doing.

Are the rate limits poorly defined and restrictive such that you can't deterministically rate limit the requests from your end to avoid them?

Yes

Basically, my first thought is that fixture retries seem like a rather imprecise and (ironically) flaky fix for your issue, but that could just be my relative ignorance about your situation.

I had the same thought initially, but I've been using retries for some time and they have been working well. I can't think of a better solution with the API I am using.

str0zzapreti commented 8 months ago

Haha, fair enough! Well, I had a slightly different idea for how to fix my Pytest 8 compatibility issue that I'm testing now, but assuming that doesn't pan out, your PR looks like a good solution. Once that's done I'll see about building a quick prototype of fixture setup retries.

adamtheturtle commented 8 months ago

Thank you!

str0zzapreti commented 7 months ago

Hey @adamtheturtle , I've been toying around with this a bit and I ran into a couple of complications which mainly stem from fixture composition. Tests are relatively easy to scope for retries since they always exist at the end of the dependency chain, but a fixture could exist anywhere in the chain between the test and the runner. There's also the matter of setup and teardown. If my plugin detects the call stage of the run, it can somewhat safely assume that setup has been fully completed and teardown is fully pending (if this is not the case, I can essentially put the responsibility on the tester to redesign their tests properly). But consider a failing fixture. Either this one fixture has multiple stages of setup and I can't know exactly how much of that process has been completed before the error occurred, or there are multiple fixtures which each have one responsibility and I can't easily determine which of them have already run. The latter case is preferable, but still tricky.

Another consideration: retries are based on test marks. Referring to https://github.com/pytest-dev/pytest/issues/1368, marks can't be applied to fixtures, and the workarounds are somewhat complex.

This leads to questions like what is the expected behavior when a fixture fails? Should only the fixture itself be retried, or all fixtures in the dependency chain? Should their teardowns be run as well to ensure proper cleanup? What if those teardowns have a dependency on completion of the setup portion (undesirable, but possible)? Does there need to be a way to mark fixtures to be retried, or should the retry mark be 'contagious' for any fixture in a test's dependency chain?

There might be more to think about with respect to lazy loading and caching of fixture results, but before I go further I just wanted to get your thoughts and also check in again to understand why, if you already have working custom retry code for these fixtures, adding this functionality to pytest-retry would seem like the better choice? I think if it was me, my instinct would probably just be to keep the setup retry code specific to my problem fixtures, but I'm curious to hear your perspective.

adamtheturtle commented 7 months ago

Hi @str0zzapreti,

Thank you for putting time into this, and for sharing your thoughts here.

This leads to questions like what is the expected behavior when a fixture fails? Should only the fixture itself be retried, or all fixtures in the dependency chain?

This is complicated, for sure. I think you have to retry all fixtures in the dependency chain, but that might be harmful with widely scoped fixtures. I appreciate that this problem has nuance.

just wanted to get your thoughts and also check in again to understand why, if you already have working custom retry code for these fixtures, adding this functionality to pytest-retry would seem like the better choice?

At this point, for me personally, my setup is working fine. Past-me would have saved development work.

What I'd gain now:

Less thinking when creating or refactoring tests

When I am developing tests, I sometimes look for shared setup, and put that shared setup into a fixture.

When I move setup code into a fixture, I do not expect the behaviour or reliability of my test suite to change, but in this case just moving setup code to a fixture makes my test suite more flaky.

I want to be able to think "Any TooManyRequests error will result in a retry. I no longer have to think about that exception type". Instead, I consider "Might this fixture trigger that error? If so, I will add a retry decorator". I had to do a bunch of codebase inspection, where I had hoped not to.

Less duplication

I have duplication in my specification of retry options (exception types, number of retries, delay).

Consistent retry pytest output

pytest-retry has its own way of showing that a test is being retried, and my custom logic does not interact with the pytest output like that.

str0zzapreti / pytest-retry

Request for retrying flaky fixtures #33