Intake catalog? - Githubissues

pangeo-forge / pangeo-smithy

The tool for managing pangeo-forge feedstocks.

5 stars 3 forks source link

Intake catalog? #8

Open martindurant opened 4 years ago

martindurant commented 4 years ago

It seems to me that it would be useful to automatically include produced artefacts into a catalog as part of the pipeline. Thoughts?

scottyhq commented 4 years ago

Just getting up to speed on how pangeo-forge is structured. I love the idea! I agree that a catalog/metadata component to the pipeline would be useful. As part of our ACCESS-funded work E84 just open-sourced a processing pipeline (https://github.com/cirrus-geo/cirrus/) based on STAC, where the inputs and outputs are STAC catalogs + optionally manipulated data. Worth considering! cc @matthewhanson

rabernat commented 4 years ago

Updating a catalog entry should be a standard part of every pipeline.

As I've stated many times, I don't think intake is the right format for a master catalog, due to it's python-specific nature. We want something like a STAC catalog, with an intake interface on top of it.

Our existing catalog, managed in https://github.com/pangeo-data/pangeo-datastore, is a great starting point for imagining how such a catalog might look.

martindurant commented 4 years ago

I don't think intake is the right format for a master catalog

Sure, but something that Intake can still read :) The typical YAML files were never meant to be the end of the story for describing Intake sources.

rabernat commented 4 years ago

something that Intake can still read

I hope that "something" can be intake-stac.