leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

Add WW3 recipe #57

Closed jbusecke closed 5 months ago

jbusecke commented 11 months ago
jbusecke commented 11 months ago

Recipe is running:

image

Ill report back after this finishes.

jbusecke commented 11 months ago

pre-commit.ci autofix

jbusecke commented 11 months ago

Got some strange errors during the first two runs:

File "/srv/conda/envs/notebook/lib/python3.9/site-packages/gcsfs/retry.py", line 104, in validate_response
    raise HttpError({"code": status, "message": msg})  # text-like
gcsfs.retry.HttpError: Request range not satisfiable, 416

@leap-stc/data-management-devs look familiar to anyone?

If seems to fail in the rechunk stage

image
jbusecke commented 11 months ago

hmmm repeat runs also fail, but all with slightly different errors.

Wondering if we are running out of memory?

image

dataflow job)

jbusecke commented 11 months ago

Perhaps we should use dataflow prime for the ingestion as well?

jbusecke commented 11 months ago

Added dataflow prime to the config for this) job. But its late and ill check on this tomorrow.

jbusecke commented 11 months ago

That did not seemed to have helped. Any clues that I might have missed @cisaacstern ?

jbusecke commented 11 months ago

Still fails...I have extended the year range to exclude that there is something wrong with these two particular files we downloaded previously.

jbusecke commented 11 months ago

Debugging this with @cisaacstern and he noticed that the files are heavily compressed (3Gb vs 17GB!), which might overwhelm our our workers memory.

Trying to deactivate prime and use a big machine for testing now.

jbusecke commented 10 months ago

pre-commit.ci autofix

jbusecke commented 10 months ago

So I have done some more investigation on this case. See this gist.

Just deployed another version that incorporates lessons learned here: https://github.com/leap-stc/data-management/pull/57/commits/5672faa754c697142e357f1b9ae1f7c4fd97eb50

Lets see how that fares...

jbusecke commented 10 months ago

@cisaacstern it seems like we are running into some issues with the new schema validation for this recipe.

If you have a minute could you look into this? Does the current way we set up this repo not conform to the schema? Or is there some other problem?

jbusecke commented 5 months ago

I am very sorry for the long delay here. I have started working on this again as an example for our new (to be announced) structure of data management here: https://github.com/leap-stc/ww3_feedstock.

jbusecke commented 5 months ago

Superseeded by https://github.com/leap-stc/wavewatch3_feedstock