lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

Add a mechanism to automatically retry failed tasks #1289

Closed tomwhite closed 6 months ago

tomwhite commented 6 months ago

It would be useful to have a way to retry failed tasks automatically as soon as they fail - and not have to manually resubmit them at the end.

I've implemented such a mechanism here, which I'd be happy to open a PR to donate it to Lithops.

The implementation uses wrappers around ResponseFuture and FunctionExecutor, so you use it like this:

function_executor = LocalhostExecutor()
with RetryingFunctionExecutor(function_executor) as executor:
    futures = executor.map(
        function,
        input,
        timeout=timeout,
        retries=retries,
    )
    done, pending = executor.wait(futures)
    ...

RetryingFuture has to have a reference to the function and the input so it can submit a new task if the original fails - so it makes sense to use a difference class to ResponseFuture. However, it might be possible to extend FunctionExecutor to take a retries argument, and only use the more heavyweight RetryingFuture if it is set. Or perhaps it's OK to add fields to ResponseFuture and only set them if retries are enabled?

JosepSampe commented 6 months ago

I've implemented such a mechanism here, which I'd be happy to open a PR to donate it to Lithops.

Great! feel free to open a PR with this.