leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 5 forks source link

Refactoring the data management #96

Open jbusecke opened 2 months ago

jbusecke commented 2 months ago

Our data management is in need for a refactor to

I have been pondering some of the choices to make and wanted to discuss them a bit more widely:

  1. Monorepo or feedstock repos? There is a fundamental question if we want to keep all feedstocks in a single repo (this one) or have a repo for each feedstock.

    • Monorepo keeps management of code/secrets 'compact'
    • But it also allows for less flexibility (swapping secrets, config files etc)
  2. "Interface" to the catalog I had discussed with @norlandrhagen that it would be great to have a fully self-contained zarr store with metadata as the ideal case to hand of to the catalog layer. The logic building the catalog could simply have a list of 'registered' zarr stores which need to contain a bunch of extra metadata (I have been tinkering with this quite a bit in https://github.com/leap-stc/proto_feedstock