pangeo-forge / roadmap

Pangeo Forge public roadmap
Creative Commons Attribution 4.0 International
19 stars 4 forks source link

ADR 8: Orchestrator Repo + CLI #37

Closed cisaacstern closed 6 months ago

cisaacstern commented 3 years ago

This ADR proposes a new pangeo-forge-orchestrator repo which aims to address our challenges re: visibility of the relationships between Pangeo Forge's modular components

as well as the lack of a single entry point from which to invoke them

A major aim of this ADR which is perhaps not yet fully articulated in the PR itself is to improve the maintainability and extensibility of our contribution workflow. As roughly documented in flow-charts/ci-flow-with-callstack.png, the automated components of our CI are spread out across a range of different GitHub Actions and other repos. This would bring them all under one roof (from an interface standpoint; other repos/packages may still be called deeper in the stack).

From a design perspective, I imagine the implementation building from the design patterns established by @andersy005 in https://github.com/pangeo-forge/pangeo-forge-recipes/pull/69 (including the use of typer and rich.table, etc.).

I'll start a draft of this repo today to experiment with some ideas.

cisaacstern commented 3 years ago

https://github.com/pangeo-forge/pangeo-forge-orchestrator/pull/1 illustrates some initial ideas for what the orchestrator interface might look like.

Any high-level process orchestration (of, e.g., cataloging) must be able to introspect the relationships between various components of Pangeo Forge (e.g., tie feedstocks back to their resulting datasets). Theoretically, storage paths should/will encode the feedstock names, but this is an incomplete solution because:

  1. The particular encoding strategy will inevitably be adjusted over time
  2. Even an idealized unchanging encoding strategy may not encode information such as dataset minor versions and the name of the specific Python recipe object (within the recipe.py module) used to build a dataset

An API as imagined by https://github.com/pangeo-forge/roadmap/pull/31 may be the eventual solution to this, but conversation yesterday with @rabernat persuaded me that a lightweight JSON object (or objects) housed at the storage location could get us up-and-running more agilely. As described in https://github.com/pangeo-forge/pangeo-forge-orchestrator/pull/1#issue-1016657975, I'm provisionally calling this "sidecar" (Ryan's term) object build-logs.json.

This will require its own ADR if we move forward with it, but for now I'm going to just continue experimenting with the idea to suss out if/how it might work for us.

abarciauskas-bgse commented 6 months ago

closing as stale