leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

Reorganization into sub-feedstocks #76

Closed jbusecke closed 5 months ago

jbusecke commented 9 months ago

Our initial design for the data ingestion was to maintain a single feedstock with many recipe modules in it, but only a single requirements.txt and meta.yaml.

I have discussed this with @cisaacstern and we both believe we are hitting the limits of this approach and need to restructure this feedstock.

It seems that we were actually building an invalid meta.yaml all along (by adding provenance data to each 'id' item, without noticing it until more strict yaml schema checking was implemented recently.

I think we inherently want each recipe to have its own separate provenance information, and thus need to move to a single feedstock per recipe. We might go as far as providing a separate runner config per recipe, which would be quite cool, since it enable a different target bucket, dataflow worker options, dataflow prime etc per recipe (I think @cisaacstern mentioned this would be very helpful for the climsim dataset).

I have gone ahead and implemented a prototype in #75 which depends on a pretty hacky change in the deploy-recipe-action(this is still in need of review and discussion).

I would like to implement this ASAP for the whole repo, but wanted to first check if this will impact the catalog in any way? @andersy005 @katamartin @norlandrhagen, do you think this would mess things up on your side? I suppose if we can move the catalog ingestion to a part of the recipe, these two parts would become truly independent. Would love to discuss this some time soon.

norlandrhagen commented 9 months ago

Happy to chat! I think giving each recipe it's own feedstock is a good change. Not sure what all is involved in the current cataloging process, but I bet @andersy005 and @katamartin will have some more intuition.

jbusecke commented 5 months ago

Closing this in favor of #109