pangeo-data / pangeo-datastore

Pangeo Cloud Datastore
https://catalog.pangeo.io
48 stars 16 forks source link

add ice sheet model ensemble data #128

Closed jkingslake closed 3 years ago

jkingslake commented 3 years ago

Hello,

@talbrecht and I recently put output from his ensemble of ice-sheet model simulations of the Antarctic Ice Sheet over the last 120ka on GCS.

Data citation: Albrecht, Torsten (2019): PISM parameter ensemble analysis of Antarctic Ice Sheet glacial cycle simulations. PANGAEA, https://doi.pangaea.de/10.1594/PANGAEA.909728

Here is a notebook showing how to access these data in pangeo.

Would this be an appropriate dataset to include in the Pangeo catalog?

This dataset is already part of an [intake catalog] ('https://raw.githubusercontent.com/ldeo-glaciology/pangeo-pismpaleo/main/paleopism.yaml'), so I assume that it would simply be a case of adding a cryo.yaml here that points to this dataset and potentially others. The data can stay in our google bucket and would only need to become 'requester pays' if we get a lot of interest, which would be great!

If this is something people would like to see happen (and my assumption about what it entails is correct), I can make a PR.

rabernat commented 3 years ago

Jonny thanks a lot for sharing this.

This pangeo-datastore repository has become sort of unmaintained. The reason is that we are moving to a new, much more ambitious and comprehensive platform for populating the cloud data library called Pangeo Forge. You can read about it here: https://pangeo-forge.readthedocs.io/

The main difference is that Pangeo Forge will actually build the Zarr dataset in the cloud out of the original sources. This resolves one of the central problems with the old approach: the breaking of the provenance chain from the original data (in your case, from PANGAEA) to the cloud-optimized format.

We would LOVE to create a Pangeo Forge recipe for this dataset. To get the process started, it would be awesome if someone could open up an issue here: https://github.com/pangeo-forge/staged-recipes/issues

In the meantime, there is not much point adding catalog entries to this catalog. It will be shut down soon.

rabernat commented 3 years ago

Just noting something quite interesting. The original data for this are stored in a giant Zip file: https://hs.pangaea.de/model/PISM/Albrecht-etal_2019/parameter-ensemble/Part2_pism_paleo_ensemble_v2.zip.

However, I was able to easily open it an load files directly using fsspec.

import xarray as xr
from fsspec.implementations.zip import ZipFileSystem
url = "https://hs.pangaea.de/model/PISM/Albrecht-etal_2019/parameter-ensemble/Part2_pism_paleo_ensemble_v2.zip"
fs = ZipFileSystem(url)
fs.ls("datapub)  # -> list the files

import xarray as xr
with fs.open('datapub/model_data/pism1.0_paleo06_6255/snapshots_-10000.000.nc') as fp:
    ds = xr.open_dataset(fp)
    ds.load()

ds.thk.plot()

image

So it may be quite easy to get the recipe going.