Open martindurant opened 4 years ago
Just getting up to speed on how pangeo-forge is structured. I love the idea! I agree that a catalog/metadata component to the pipeline would be useful. As part of our ACCESS-funded work E84 just open-sourced a processing pipeline (https://github.com/cirrus-geo/cirrus/) based on STAC, where the inputs and outputs are STAC catalogs + optionally manipulated data. Worth considering! cc @matthewhanson
Updating a catalog entry should be a standard part of every pipeline.
As I've stated many times, I don't think intake is the right format for a master catalog, due to it's python-specific nature. We want something like a STAC catalog, with an intake interface on top of it.
Our existing catalog, managed in https://github.com/pangeo-data/pangeo-datastore, is a great starting point for imagining how such a catalog might look.
I don't think intake is the right format for a master catalog
Sure, but something that Intake can still read :) The typical YAML files were never meant to be the end of the story for describing Intake sources.
something that Intake can still read
I hope that "something" can be intake-stac.
It seems to me that it would be useful to automatically include produced artefacts into a catalog as part of the pipeline. Thoughts?