pangeo-cmip6 / sync

Workflows to keep CMIP6 data synchronized between GCS and S3 storage
2 stars 1 forks source link

Reworking workflow to reflect changed directory structure #2

Closed charlesbluca closed 3 years ago

charlesbluca commented 3 years ago

Didn't have time today to fully look through the changes to the gs://cmip6 directory structure, but still opening this issue to ensure that I check this out and make the necessary updates to the workflow.

However, from a cursory glance it looks like things may not be as simple as just altering the prefix (could be subject to change as I know the big move isn't finished yet), so I think it may be useful to list out the different jobs here and go through them individually to make sure we aren't missing anything:

I'll update this tomorrow with some specifics about each job and if they can/should be changed.

cc @naomi-henderson

naomi-henderson commented 3 years ago

@charlesbluca , what can I do to help?

charlesbluca commented 3 years ago

Sorry I never followed up on this!

Essentially, just clarifying that the following changes are true:

Are there any specific cases where a folder under CMIP/ would've been moved to CMIP6 (e.g. CMIP/NCAR/ -> CMIP6/NCAR)?

If not, then this should just be a matter of switching/adding the prefix. If so, then we might need to go over those specific folders and make sure they are covered.

charlesbluca commented 3 years ago

Ah! Another big thing I forgot about:

CMIP6/ contains both CMIP/ and ScenarioMIP, but what about the subdirectories? Are those just the other folders that used to sit on the root of gs://cmip6? if so, I can probably mess around with the filters of remainder to pick up these subdirectories (as I think that once all the changes are done, the only directory it would be syncing is tracmip/).

naomi-henderson commented 3 years ago

Look at my pull request - I think I have made the appropriate changes

Here is the new structure of GCS:

gs://cmip6/CMIP3/
gs://cmip6/CMIP5/
gs://cmip6/CMIP6/
gs://cmip6/DCPP/
gs://cmip6/GFDL_CM2_6/
gs://cmip6/tracmip/

and here is what is now in CMIP6

gs://cmip6/CMIP6/AerChemMIP/
gs://cmip6/CMIP6/C4MIP/
gs://cmip6/CMIP6/CDRMIP/
gs://cmip6/CMIP6/CFMIP/
gs://cmip6/CMIP6/CMIP/
gs://cmip6/CMIP6/DAMIP/
gs://cmip6/CMIP6/FAFMIP/
gs://cmip6/CMIP6/GMMIP/
gs://cmip6/CMIP6/HighResMIP/
gs://cmip6/CMIP6/LS3MIP/
gs://cmip6/CMIP6/LUMIP/
gs://cmip6/CMIP6/OMIP/
gs://cmip6/CMIP6/PAMIP/
gs://cmip6/CMIP6/PMIP/
gs://cmip6/CMIP6/RFMIP/
gs://cmip6/CMIP6/ScenarioMIP/
gs://cmip6/CMIP6/old_catalogs/
naomi-henderson commented 3 years ago

Probably we should just ignore gs://cmip6/CMIP6/old_catalogs/ for now, it is only relevant for GCS

charlesbluca commented 3 years ago

Okay I just saw your branch - opened up #4 so we can do some further review and comparison.

charlesbluca commented 3 years ago

Thanks for the help @naomi-henderson! I gave you write access to the repo in case any other quick fixes need to be made.

Glancing through the workflow, it looks like all data should be covered now; from here, we can actively monitor the status and runtimes of the individual jobs to get a sense of if any can be merged (not really an urgent thing, but could be useful later on).

I'll also check in on #3 with any updates on changing the catalogs job to handle deleted catalogs.

naomi-henderson commented 3 years ago

No, thank you @charlesbluca ! I will keep my eyes on it in case any of the jobs cause trouble

naomi-henderson commented 3 years ago

All seems to be working with the new structure