Open jbusecke opened 1 year ago
I think there's a third option of automated use of linked PR against existing stores?
Where the input PCollection is just the zarr.storage.FSStore of the existing store.
Also https://github.com/pangeo-forge/pangeo-forge-recipes/pull/556 falls into this category.
I think there's a third option of automated use of linked PR against existing stores?
Where the input PCollection is just the zarr.storage.FSStore of the existing store.
I think this is what I will do for the ClimSim mlo
data https://github.com/leap-stc/ClimSim/issues/38#issuecomment-1687209517 which is slow to load without https://github.com/pangeo-forge/pangeo-forge-recipes/pull/556.
I am afraid I do not quite understand what that third option is?
I am afraid I do not quite understand what that third option is?
Run a pipeline like this on Dataflow:
from pangeo_forge_recipes.transforms import ConsolidateCoordinateDimensions
existing_paths: list[str] = get_existing_paths_from_bigquery(...)
def path_to_fsstore(path: str) -> zarr.storage.FSStore:
...
return store
recipe = (
beam.Create(existing_paths)
| beam.Map(path_to_fsstore)
| ConsolidateCoordinateDimensions()
)
Ahhhhh, yes that makes sense. I could do that in retrospect once, and then add such a stage to new recipes.
Still relevant. I am copying the successful ingestions over to the public buckets and catalog them in leap-pangeo.cmip6_pgf_ingestion.leap_legacy
. We could probably run a script over this and consolidate the coordinates afterwards.
Just going through old issues. I think this might actually be addressed by our current QC (i.e. unconsolidated stores are not passing the tests?), but would need to check that
Since we are currently not performing consolidation (waiting for https://github.com/pangeo-forge/pangeo-forge-recipes/pull/575), we have two options for the future: