lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

Add a mechanism to automatically retry failed tasks #1291

Closed tomwhite closed 6 months ago

tomwhite commented 6 months ago

Fixes #1289

This is a first draft - it still needs API doc and tests.

As noted in #1289 this uses wrappers around ResponseFuture and FunctionExecutor. I considered using subclasses, but that wasn't possible for ResponseFuture because RetryingFuture updates the underlying ResponseFuture for each retry. And FunctionExecutor already has subclasses (LocalhostExecutor, ServerlessExecutor, and StandaloneExecutor), so that wasn't possible either.

It might be possible to add the retry logic directly into the existing ResponseFuture and FunctionExecutor classes, but I'm not familiar enough with the internals, so I wasn't sure how easy that would be or if it's even desirable. @JosepSampe what do you think?

The current implementation in this PR also lacks some of the methods that FunctionExecutor has: call_async, map_reduce, and get_result. It would be possible to add these (probably later), but I haven't needed them so I haven't spent any time looking at what's needed.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.
tomwhite commented 6 months ago

I've added some tests, but they are written using pytest features such as tmp_path and parameterization. Is it possible to change things to use pytest to run the tests? (It can run regular Python unit tests, but not vice versa.)

tomwhite commented 6 months ago

I've added some tests, but they are written using pytest features such as tmp_path and parameterization. Is it possible to change things to use pytest to run the tests?

The use of unittest is baked into lithops test, so it's not trivial to switch to pytest unfortunately. I've converted the retries test to use unittest now.

This is now ready for review @JosepSampe.

JosepSampe commented 6 months ago

Thanks @tomwhite Looks really good! I will check in the future how to switch the tests to pytest to benefit from its futures.

tomwhite commented 6 months ago

Thanks @tomwhite Looks really good!

Thank you - and thanks for merging it!

I will check in the future how to switch the tests to pytest to benefit from its futures.

I think that would be a good change. It's easy to parameterize pytest in lots of ways so you can still benefit from running the Lithops tests against different executors and storage backends.