Retries with prefect (and other executors)

pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.

MIT License

162 stars 25 forks source link

It's pretty common for parallel execution engines to retry failed tasks some number of times, e.g., for Cloud Dataflow: "Note: The Dataflow service retries failed tasks up to 4 times in batch mode, and an unlimited number of times in streaming mode. In batch mode, your job will fail; in streaming, it may stall indefinitely."

That said, it might be a bad idea to include this in rechunker. We have such a robust_getitem function in xarray, which we use when loading remote datasets over a network: https://github.com/pydata/xarray/blob/4f414f2d5eb2e5a12fb8ae1012c5ac7aa43b6f0b/xarray/backends/common.py#L41

pangeo-data / rechunker

Retries with prefect (and other executors) #51