leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 5 forks source link

Add dataset schema and validation script with Github workflow #3

Closed andersy005 closed 1 year ago

andersy005 commented 1 year ago

This PR introduces a dataset schema to ensure data consistency and a validation script that checks datasets against the schema. Additionally, a Github workflow has been added to run the validation on each submission automatically.

To demonstrate the functionality, an example dataset (taken from https://pangeo-forge.org/dashboard/feedstock/43) has been included.

jbusecke commented 1 year ago

input for https://leap-data-catalog.vercel.app/

😍

andersy005 commented 1 year ago

@jbusecke, when you're ready to add datasets to the catalog, https://github.com/leap-stc/data-management/tree/main/catalog/datasets is all set up and waiting. All that needs to be done is the addition of YAML files.