pangeo-data / pangeo-datastore

Pangeo Cloud Datastore
https://catalog.pangeo.io
48 stars 16 forks source link

Pangeo Cloud Datastore

Catalog Status: Build Status

Browseable Online Website: https://pangeo-data.github.io/pangeo-datastore/

This repository is where Pangeo's official cloud data catalog lives. This catalog is an Intake catalog. Most of the data is stored in Zarr format and meant to be opened with Xarray.

The master intake catalog URL is

https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml

Requirements

Using this catalog requires package versions that are quite recent as of April, 2019.

Examples

To open the catalog and load a dataset from python, you can run the following code

import intake
cat_url = 'https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml'
cat = intake.open_catalog(cat_url)
ds = cat.atmosphere.gmet_v1.to_dask()

To explore the whole catalog, you can try

cat.walk(depth=5)

Accessing requester pays data

Several of the datasets within the cloud data catalog are contained in requester pays storage buckets. This means that a user requesting data must provide their own billing project (created and authenticated through Google Cloud Platform) to be billed for the charges associated with accessing a dataset. To set up an GCP billing project and use it for authentication in applications:

Adding Datasets

To suggest adding a new dataset, please open an issue.