pangeo-forge / roadmap

Pangeo Forge public roadmap
Creative Commons Attribution 4.0 International
19 stars 4 forks source link

Proposal: eliminate bakeries.yaml - split into database entry and bakery-specific config #47

Open rabernat opened 2 years ago

rabernat commented 2 years ago

Now that we have the API, we may no longer need bakeries.yaml as our "database" of bakeries.

However, we need to make the distinction between two distinct needs:

  1. Public information about the bakeries that many clients need to know. For example, a recipe meta.yaml needs to route itself to a bakery. It needs to identify the bakery by a unique name. This also includes things like bakery contacts, description, etc.
  2. Private, internal configuration used in registering flows and within the bakery flow runs. For example:
       cluster_options:
      vpc: vpc-01160815e8310bbe0
      cluster_arn: arn:aws:ecs:us-west-2:552819999234:cluster/pangeo-forge-aws-bakery-dask-test-bakeryclusterdasktest4E1A8264-k6Cnpc8EuUpH
      task_role_arn: arn:aws:iam::552819999234:role/pangeo-forge-aws-bakery-d-prefectecstaskroledaskte-XUA2HGW10IJV
      execution_role_arn: arn:aws:iam::552819999234:role/pangeo-forge-aws-bakery-d-prefectecstaskexecutionr-KLL5UEHNBF8Z
      security_groups:
        - sg-0ca6b9d46294e5623

    This really doesn't feel like something that the world outside the bakery needs to care about. In addition, bakeries evidently also have environment variables which determine their config, e.g https://github.com/pangeo-forge/pangeo-forge-gcs-bakery/pull/27. So the config is spread around in multiple places. This seems fragile / hard to debug.

The challenge is that bakeries.yaml is currently storing both types of information. So I propose we separate these two categories. Stuff in category 1 needs to go into the API. Stuff in category 2 should live in a config file that is maintained by the bakery operator.