I have .nc4 inputs on s3 and I've noticed when trying to ETL archives >= ~5k range (the whole archive is about ~225k) that the function open_with_kerchunk might bubble any number of sublassed botocore.exceptions.ConnectionError exceptions (such as connection timeouts) that are expected during a lot of network interaction. It seems like open_with_xarray might be prone to the same issue.
Failures like these for single inputs have the consequence of failing the whole pipeline
Possible Solution
Is there some way to catch these connection errors across fsspec targets, log the problem inputs and make sure downstream transforms gracefully continue processing?
Closing this for now b/c skipping doesn't seem to work since downstream workflows produce index errors assuming those skipped timesteps need to be there. Next ideas:
The s3 buckets I'm using have auth gateways. They might possibly be the problem
See how checkpoints in beam/flink work
If all else fails we might need to catch errors and create bunk records that we can then reprocess
Problem
I have
.nc4
inputs ons3
and I've noticed when trying to ETL archives >= ~5k range (the whole archive is about ~225k) that the functionopen_with_kerchunk
might bubble any number of sublassedbotocore.exceptions.ConnectionError
exceptions (such as connection timeouts) that are expected during a lot of network interaction. It seems likeopen_with_xarray
might be prone to the same issue.Failures like these for single inputs have the consequence of failing the whole pipeline
Possible Solution
Is there some way to catch these connection errors across
fsspec
targets, log the problem inputs and make sure downstream transforms gracefully continue processing?