leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

Migrate Pangeo Forge API catalog into LEAP Data Catalog #74

Closed cisaacstern closed 5 months ago

cisaacstern commented 10 months ago

I'm winding down the Pangeo Forge hosted catalog as part of the project's refocus towards ETL tooling and supporting users (including LEAP!) to host their own catalogs, xref:

As part of that, we don't want the existing data in the catalog to be forgotten, so hoping LEAP will be able to host references to it (the data itself will continue to live on OSN).

So far, this PR is just a script I'm working on to automatically extract the Pangeo Forge catalog data into the format required by the LEAP catalog. I will delete this script before the PR is complete. Opening is draft for now.

cisaacstern commented 10 months ago

I've exported all of the entries from https://pangeo-forge.org/catalog into the catalog here, with the exception of HadISST, which was already here!

@jbusecke, do we want to put the following cmip6 entries somewhere else:

?

cisaacstern commented 10 months ago

@andersy005, welcome your review, though I do think I've got the technical side worked out based on the very clear README! 🙏 Mostly tagged you for visibility, since you've worked on both the Pangeo Forge catalog and this one, of course.

jbusecke commented 9 months ago

Sorry I just saw this late (today is going to be purely catch up on github hahaha). This is a fantastic idea in general and I support it 100%.

I think it might be useful to chat through this, and see how we can streamline this. I am in particular curious if anything here is affected by a likely refactor in data-management.

cisaacstern commented 8 months ago

@jbusecke sorry for the delayed response.

I don't think this would be affected by the refactor because the data added here do not have corresponding feedstocks/recipes in this repo.

This is purely a migration of data build previously in Pangeo Forge, all of which have feedstock repos linked in the catalog entries added by this PR.

So I think we can merge unless you have other questions?

cisaacstern commented 8 months ago

I think the one thing we may want to exclude here is possibly the CMIP things?

jbusecke commented 8 months ago

I think that is the right intuition. My suggestion is: For each of the CMIP based recipes open a request issue in the new feedstock. That way we can make sure these datasets are eventually ingested (but in a consistent manner). I would like to have a link to CMIP data in the catalog eventually but we can discuss that separately