stitchfix / flotilla-os

Open source Flotilla
Apache License 2.0
192 stars 10 forks source link

Multi-Cluster for Non-EMR Jobs #470

Closed leejustin closed 11 months ago

leejustin commented 1 year ago

First pass. Not tested yet.

Notes:

  1. Storing the TargetCluster field only in the TaskDefinition level and not persisting it down to the Task level. The abstraction should only be exposed in the DB level.
  2. TargetCluster will take precedence over other ways of defining a cluster name. This enables definition-level flexibility in defining clusters.
  3. This was not set up for EMR jobs because having virtual clusters already allows us to specify EMR jobs to run in another cluster. There's also some coupling in the actual EKS cluster itself to manage virtual clusters so it can't be entirely handled on the code level. This means that multi-cluster support for EMR is more complicated. We can split the traffic from EMR and other jobs, but not in EMR itself.
  4. New enforcement that a cluster should be defined in the environment variables for it to be considered a valid cluster to run on. This means that an override cluster must be already defined in the main list of clusters to run with.