leap-stc / leap-stc.github.io

LEAP Technical Operations Manual Website
https://leap-stc.github.io/
5 stars 4 forks source link

Refactor Data Guide #162

Closed jbusecke closed 2 weeks ago

jbusecke commented 2 months ago

This is an effort to update the data-guide with some new instructions reflecting the pgf ingestion pipeline and the now available catalog.

github-actions[bot] commented 2 months ago

👋 Thanks for opening this PR! The Cookbook will be automatically built with GitHub Actions. To see the status of your deployment, click below. 🔍 Git commit SHA: 55ef5d9d30241607988aeb31e1328e6a5f054001 ✅ Deployment Preview URL: https://leap-stc.github.io/_preview/162

jbusecke commented 2 weeks ago

Ok I finally finished this major refactor of the docs. @SammyAgrawal @norlandrhagen if you could review the preview above (once it is done building) that would be tremendously helpful.

If you think in particular the Catalog and Ingestion parts are helpful enough, we can finally unblock the official release of the catalog!

norlandrhagen commented 2 weeks ago

Nice work @jbusecke!

A few things:

Maybe it can link directly to the pangeo-forge-recipes docs. Same with the Zarr link.

SammyAgrawal commented 2 weeks ago

I really like the addition of the data guide, I think that is really awesome!

There are some small edits I want to make (language cleanup) and will circle back on some organizing stuff. Overall I think this is great though!

jbusecke commented 2 weeks ago

Thanks for all the feedback @norlandrhagen and @SammyAgrawal.

An overall issue is that some of the text is hella old, as you pointed out, and we should work to replace it slowly, not sure if we want to do it all in one?

I'm not super clear on the data library / data catalog differences. It seems like the catalog has kind of become the library instead of a STAC catalog?

I personally see the libary = catalog + data storage. Is there some way to highlight this?

This section describing the data catalog seems outdated. We can probably point to the existing catalog instead of the Radiant Earth link.

This is sort of emblematic of that issue, we need to find a way to reconcile the old 'vision' language with what is actually implemented. I for now kept most of the old language, but I agree this is a bit confusing.

In the types of data supported: Linking an existing (public, egress-free) ARCO dataset to the Data Catalog, not sure if it's worth mentioning, but zarr-proxy can get around CORS restrictions to outside data.

@norlandrhagen It would be great if you could add an admonition with some detail on that here!

In Ingesting Datasets into Cloud Storage #2, bullet 3, the link to Pangeo-Forge brings you to this page:

Good question here. I was thinking that we might add a short description in the reference at some point + a link. That might be better for people to quickly look up things while staying in the same website. But on the other hand I see your point...not quite sure how to handle this TBH.

Under the Ingesting Datasets into Cloud Storage section, there is a markdown formatting error in the [!note] tag

Fixed. Thanks for spotting this.

I wonder whether certain "How to" sections could be punchier, as self contained guides to accomplish a certain task?

Certainly room for improvement, but I would love to split that into another PR. Could you lead that effort (maybe just start with an issue linked to this?).

Is there an idea for what we want to include in the policies and how they are separated from the guides? Since the high level description of how these fit into Pangeo are subsections of the Architecture?

I have some ideas, but am not 100% sure, but I am happy to work on this. I think of these more as "how do we deal with data", e.g. always ask before using someones data, always try to be as open as possible, etc.

I think the Ingesting Datasets section remains somewhat confusing

This is really relevant! Can you be a bit more specific what you find confusing? Inline comments/suggestions on the code/text would be most helpful here. Thank you.

SammyAgrawal commented 2 weeks ago

Can we merge this?

jbusecke commented 2 weeks ago

Merging this now.