Closed jbusecke closed 5 months ago
Recipe is running:
Ill report back after this finishes.
pre-commit.ci autofix
Got some strange errors during the first two runs:
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/gcsfs/retry.py", line 104, in validate_response
raise HttpError({"code": status, "message": msg}) # text-like
gcsfs.retry.HttpError: Request range not satisfiable, 416
@leap-stc/data-management-devs look familiar to anyone?
If seems to fail in the rechunk stage
hmmm repeat runs also fail, but all with slightly different errors.
Wondering if we are running out of memory?
Perhaps we should use dataflow prime for the ingestion as well?
Added dataflow prime to the config for this) job. But its late and ill check on this tomorrow.
That did not seemed to have helped. Any clues that I might have missed @cisaacstern ?
Still fails...I have extended the year range to exclude that there is something wrong with these two particular files we downloaded previously.
Debugging this with @cisaacstern and he noticed that the files are heavily compressed (3Gb vs 17GB!), which might overwhelm our our workers memory.
Trying to deactivate prime and use a big machine for testing now.
pre-commit.ci autofix
So I have done some more investigation on this case. See this gist.
Just deployed another version that incorporates lessons learned here: https://github.com/leap-stc/data-management/pull/57/commits/5672faa754c697142e357f1b9ae1f7c4fd97eb50
Lets see how that fares...
@cisaacstern it seems like we are running into some issues with the new schema validation for this recipe.
If you have a minute could you look into this? Does the current way we set up this repo not conform to the schema? Or is there some other problem?
I am very sorry for the long delay here. I have started working on this again as an example for our new (to be announced) structure of data management here: https://github.com/leap-stc/ww3_feedstock.
Superseeded by https://github.com/leap-stc/wavewatch3_feedstock