leap-stc / cmip6-leap-feedstock

Apache License 2.0
12 stars 5 forks source link

Consolidating existing stores? #22

Open jbusecke opened 1 year ago

jbusecke commented 1 year ago

Since we are currently not performing consolidation (waiting for https://github.com/pangeo-forge/pangeo-forge-recipes/pull/575), we have two options for the future:

cisaacstern commented 1 year ago

I think there's a third option of automated use of linked PR against existing stores?

Where the input PCollection is just the zarr.storage.FSStore of the existing store.

cisaacstern commented 1 year ago

Also https://github.com/pangeo-forge/pangeo-forge-recipes/pull/556 falls into this category.

cisaacstern commented 1 year ago

I think there's a third option of automated use of linked PR against existing stores?

Where the input PCollection is just the zarr.storage.FSStore of the existing store.

I think this is what I will do for the ClimSim mlo data https://github.com/leap-stc/ClimSim/issues/38#issuecomment-1687209517 which is slow to load without https://github.com/pangeo-forge/pangeo-forge-recipes/pull/556.

jbusecke commented 1 year ago

I am afraid I do not quite understand what that third option is?

cisaacstern commented 1 year ago

I am afraid I do not quite understand what that third option is?

Run a pipeline like this on Dataflow:

from pangeo_forge_recipes.transforms import ConsolidateCoordinateDimensions

existing_paths: list[str] = get_existing_paths_from_bigquery(...)

def path_to_fsstore(path: str) -> zarr.storage.FSStore:
    ...
    return store

recipe = (
    beam.Create(existing_paths)
    | beam.Map(path_to_fsstore)
    | ConsolidateCoordinateDimensions()
)
jbusecke commented 1 year ago

Ahhhhh, yes that makes sense. I could do that in retrospect once, and then add such a stage to new recipes.

jbusecke commented 7 months ago

Still relevant. I am copying the successful ingestions over to the public buckets and catalog them in leap-pangeo.cmip6_pgf_ingestion.leap_legacy. We could probably run a script over this and consolidate the coordinates afterwards.

jbusecke commented 4 months ago

Just going through old issues. I think this might actually be addressed by our current QC (i.e. unconsolidated stores are not passing the tests?), but would need to check that