Graceful Openers for Connection Timeouts?

pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Apache License 2.0

126 stars 54 forks source link

Problem

I have .nc4 inputs on s3 and I've noticed when trying to ETL archives >= ~5k range (the whole archive is about ~225k) that the function open_with_kerchunk might bubble any number of sublassed botocore.exceptions.ConnectionError exceptions (such as connection timeouts) that are expected during a lot of network interaction. It seems like open_with_xarray might be prone to the same issue.

Failures like these for single inputs have the consequence of failing the whole pipeline

Possible Solution

Is there some way to catch these connection errors across fsspec targets, log the problem inputs and make sure downstream transforms gracefully continue processing?

pangeo-forge / pangeo-forge-recipes

Graceful Openers for Connection Timeouts? #667

Problem

Possible Solution