Open jbusecke opened 1 year ago
With
import xarray as xr
import gcsfs
gcs = gcsfs.GCSFileSystem()
f = "gs://leap-scratch/data-library/cache/b6430036d547ee167decac45ca4a44c2-http_vesg.ipsl.upmc.fr_thredds_fileserver_cmip6_scenariomip_ipsl_ipsl-cm6a-lr_ssp585_r1i1p1f1_omon_zmeso_gn_v20190903_zmeso_omon_ipsl-cm6a-lr_ssp585_r1i1p1f1_gn_210101-220012.nc"
ds = xr.open_dataset(gcs.open(f), use_cftime=True, chunks={})
OSError: Unable to synchronously open file (truncated file: eof = 1878556231, sblock->base_addr = 0, stored_eof = 14763458119)
So maybe the caching was interrupted resulting in "truncated file"?
Interesting. Should we manually delete the cache and rebuild in that case? Or is there a way to trigger a recaching from within the recipe if this sort of error occurs?
Or is there a way to trigger a recaching from within the recipe if this sort of error occurs?
You can just re-run the job, the caching step should recognize that the already cached file is a different size than the source file, and re-cache. Note here caching is only skipped if source and cached files are the same size:
Ill leave this open for now, but I suspect that either the source file is broken or the thing will fix itself. Either way no action on our end needed. Thx @cisaacstern
A couple of jobs lately failed out with the following error:
I am able to reproduce this with the cached file:
Wondering if this means the file is corrupted or if we could fix this somehow.
Dataflow job)