nextGEMS / catalog

Intake catalog for nextgems
5 stars 7 forks source link

Add S3 endpoint source #94

Closed mannreis closed 1 month ago

mannreis commented 1 month ago

At the moment only TIME = P1D is completely available on the nextgems S3 bucket @ DKRZ

d70-t commented 1 month ago

Thanks for providing and adding this version 👌 Instead of adding a different simulation id to the standard ICON catalog, this entry should use the same simulation ID as for the data on levante, but instead go into the ICON_online catalog. Within this repository, there's a second catalog hierarchy, available as data.nextgems-h2020.eu/online.yaml . That catalog should in principle contain the same entries as the main catalog, but only points to data available publicly. That way, users would only have to swap the catalog URL, but the rest of the scripts can stay the same. Up to now, there has only been a subset of ngc3028 available online, so that catalog was rarely used. But this PR would extend this nicely (and I'd hope to see more like this in future).

mannreis commented 1 month ago

Thanks for the suggestion! The problem is that the endpoint cannot be resolved outside DKRZ network (yet!?). This would mainly to see how much of a difference it would be to load data from S3. I can nevertheless add this source to the online catalog and hope the service becomes publicly available soon :smiling_face_with_tear:

d70-t commented 1 month ago

Oh no 😬... If it's not reachable from the outside, then it also shouldn't be in the online catalog (the catalog should only contain things that actually work).

But then, the question would be, why would you want to add the S3 endpoint as a new entry? The catalog should contain the current "best" way to access a given dataset. So if S3 is better than the lustre endpoint, then S3 should replace lustre and if lustre is better, then it should stay lustre.

It may still ok to add it temporarily for testing purposes, is that what you've in mind? If yes, then I'd probably go for the current version.

florianziemen commented 1 month ago

Hmm, for testing I'd actually stay with the separate branch. That can still be used with its URL ( https://raw.githubusercontent.com/mannreis/catalog/main/catalog.yaml ) in this case.

florianziemen commented 1 month ago

Apart from that, thanks a lot! It's great to see progress on this!

mannreis commented 1 month ago

Then I'll close the PR and if the time comes I know where to add such source.