numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.1k stars 112 forks source link

Support controller-wide HA Daemon configuration #331

Open dseapy opened 1 year ago

dseapy commented 1 year ago

Summary

305 simplifies Pipeline yaml for daemon deployments for resources, replicas, annotations, etc. However it's not particularly useful for anything requiring the pipeline-name (i.e. affinity's labelSelector or podDisruptionBudget's matchLabels). PodDisruptionBudget being a bit different from affinity, since it's a separate resource that currently the controller doesn't manage.

Would be helpful to support configuring these things on the controller, maybe one of the following?:

  1. Make templates go-templates that get parsed, and then in reconcile get execute'd with access to the Pipeline object, allowing user access to the pipeline resource (i.e. numaflow.numaproj.io/pipeline-name: {{ .Pipeline.Name}})
  2. Support a way to specify affinity and podDisruptionBudgets on the Pipeline that doesn't get directly passed through to the pod.

PodDisruptionBudgets would need to be owned by the Pipeline resource and managed by the controller.

Use Cases

Avoid needing to add something like the following to each pipeline to ensure the daemon is always available (replace example-pipeline with the pipeline name):

  templates:
    daemon:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                      - daemon
                  - key: numaflow.numaproj.io/pipeline-name
                    operator: In
                    values:
                      - example-pipeline
              topologyKey: kubernetes.io/hostname
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: example-pipeline
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: daemon
      numaflow.numaproj.io/pipeline-name: example-pipeline

This isn't super important for me atm, but would be nice to support for a simpler production install. Otherwise in clusters that scale up/down a lot, k8s cluster autoscaler can terminate nodes containing all the daemon pods for a particular Pipeline at the same time & cause a disruption for anyone querying daemon.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

dseapy commented 1 year ago

A similar use-case would be wanting to avoid vertices with all it's pods on the same node and having it scaled down by cluster-autoscaler causing a temporary disruption in the data stream. Having to configure affinity for each vertex would be tedious and can't be done outside of the vertex as you would need both the pipeline and vertex names in the label match:

numaflow.numaproj.io/pipeline-name=example-pipeline
numaflow.numaproj.io/vertex-name=example-vertex
whynowy commented 1 year ago

I kind of think it's not good to have things like PDB incorporated into Pipeline spec, since ppl can configure this by themselves - of course we should provide documents on how to configure it (mainly for the labels). Reason for that are:

  1. That makes our spec redundant;
  2. Makes numaflow tided with those APIs - e.g. if PDB api changes, we have to change;

The affinity example is actually the reason I was thinking of if we should expose the controller level customization the other day - affinity spec itself is already very flexible, having all of these customization makes things even more flexible and complicated, I'm not quite sure it's good or not.

Thoughts?

dseapy commented 1 year ago

Yep, that makes sense to me.

What i'm struggling with is that on one cluster i want the ability to read/edit/diff/run any pipelines easily and then i want to promote them to another cluster ensuring they have certain PDB, affinity, annotations, replicas, resources, etc. exist on vertices, daemon, jobs.

FWIW this isn't really a numaflow-specific need, as it applies to web apps as well. I think the main difference is that a Deployment resource has a single pod template, whereas a single Pipeline resource can have a lot. Maybe in the future a Pipeline can reference other resources to populate some config, which might help break things up into multiple files and promote re-use making it more manageable to work with Pipelines that have a lot of yaml.

Anyway, for now maybe it's best to close https://github.com/numaproj/numaflow/pull/314, can always re-open or borrow from it in the future if it's useful. I think for now i'll write a migration script for my pipeline resources (in my personal github) that adds the bits i'm looking for. That gets me what I need for now since numaflow main already supports configuring all the things I want to configure.

dseapy commented 1 year ago

not sure what issues are best to have open. sounds like close this one and milestone v0.8's Production Readiness theme will maybe add docs for all the things someone might want to configure in production?

sounds like #305 is TBD, so don't know if that one makes sense to keep open or close?