Add an option for preventing scheduling of daemonsets on Fargate

nitishchandrapatil commented 10 months ago

Currently we are using EKS to run our workloads which also has both Fargate profiles and ec2 instances. While trying to deploy the helm chart, its trying to schedule on Fargate as well. Tried using NodeAffinity to prevent it but realised there isn't a support for that. Not able to understand how to proceed from here. This issue is currently blocking the implementation of monitoring solution on our environment.

petewall commented 10 months ago

So your cluster has a mix of EC2 nodes and Fargate nodes?

petewall commented 10 months ago

Easiest solution today would be to disable Node Exporter and utilize the Kubernetes API log gathering (which removes the Daemonset for Grafana Agent for Logs): https://github.com/grafana/k8s-monitoring-helm/tree/main/examples/eks-fargate

I'll look into other solutions for your situation.

petewall commented 10 months ago

Can you try this? We might consider applying it by default for grafana-agent-logs by default:

grafana-agent-logs:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - fargate

yurii-kryvosheia commented 9 months ago

Can you try this? We might consider applying it by default for grafana-agent-logs by default:

grafana-agent-logs:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - fargate

Missing controller


grafana-agent-logs:
  controller:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: eks.amazonaws.com/compute-type
                  operator: NotIn
                  values:
                    - fargate

petewall commented 9 months ago

Reopening. The PR I just merged sets the affinity rules for Node Exporter only. The reason I didn't do so for grafana-agent-logs is because it makes it harder for pure fargate clusters to make this work because they would need to undo the default affinity in order to get a single pod scheduled, even as a deployment.

nitishchandrapatil commented 9 months ago

Cool. Thanks for reopening the ticket. I will soon test it once a resolution is found for this :) .

petewall commented 9 months ago

Here's the trick, and why this still remains unresolved.

Node Exporter is simple. It does not go on Fargate nodes. You don't get node metrics for those nodes, but you likely don't care about them. That's AWS' problem. The PR that I merged last week sets the affinity rule to avoid fargate nodes. Done.

The other daemonset is the Grafana Agent for gathering logs. In daemonset mode, along with logs.pod_logs.gatherMethod=volumes, the only way we gather logs is by being on the same node as the pods. That means, if we apply the same affinity rule, we then lose pod logs for pods on fargate nodes.

The workaround, especially for fargate-only clusters, was to set the agent to be a deployment and set logs.pod_logs.gatherMethod=api.

I was working on a "hybrid", where you could use volumes in a daemonset, but the instances would use the API to gather pod logs from pods on fargate or windows nodes. But there's a problem with memory and cpu consumption when trying to discover those pods, especially on very large clusters. It was a non-starter.

I think the ultimate solution for now will be to use logs.pod_logs.gatherMethod=api, set the affinity rule, and the controller type to deployment.

khaykingleb commented 7 months ago

@petewall, hey!

I think the ultimate solution for now will be to use logs.pod_logs.gatherMethod=api, set the affinity rule, and the controller type to deployment.

But is the nodeAffinity supported, though? The following doesn't seem to work:

# Settings related to capturing and forwarding logs
logs:
  # -- Capture and forward logs
  enabled: true

  # Settings for Kubernetes pod logs
  pod_logs:
    # -- Capture and forward logs from Kubernetes pods
    enabled: true

    # -- Controls the behavior of gathering pod logs.
    # When set to "volumes", the Grafana Agent will use HostPath volume mounts on the cluster nodes to access the pod
    # log files directly.
    # When set to "api", the Grafana Agent will access pod logs via the API server. This method may be preferable if
    # your cluster prevents DaemonSets, HostPath volume mounts, or for other reasons.
    gatherMethod: "api" 

grafana-agent-logs:
  agent:
    # Enable clustering by default to make it simpler when using API-based log gathering.
    clustering: {enabled: true}

    mounts:
      # Mount /var/log from the host into the container for log collection.
      varlog: false

    controller:
      replicas: 2
      type: deployment

      # NB(khaykingleb): don't schedule the Grafana Agent on Fargate nodes
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: eks.amazonaws.com/compute-type
                    operator: NotIn
                    values: 
                      - fargate

Since

$ kubectl get pods -n monitoring
NAME                                                 READY   STATUS    RESTARTS   AGE
k8s-monitoring-grafana-agent-0                       2/2     Running   0          3m34s
k8s-monitoring-grafana-agent-logs-295mx              0/2     Pending   0          3m34s
k8s-monitoring-grafana-agent-logs-2cwlv              2/2     Running   0          3m34s
k8s-monitoring-grafana-agent-logs-729rs              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-894j7              0/2     Pending   0          3m34s
k8s-monitoring-grafana-agent-logs-c6psl              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-jpvq9              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-kgtfq              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-klqcg              0/2     Pending   0          3m34s
k8s-monitoring-grafana-agent-logs-lqkjg              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-lttq4              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-pqn89              2/2     Running   0          3m33s
k8s-monitoring-grafana-agent-logs-qmjll              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-rxjpb              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-vw75z              0/2     Pending   0          3m34s
k8s-monitoring-grafana-agent-logs-w8kzz              0/2     Pending   0          3m33s
k8s-monitoring-grafana-agent-logs-x4hrk              2/2     Running   0          3m33s
k8s-monitoring-grafana-agent-logs-x4l7t              0/2     Pending   0          3m34s
k8s-monitoring-grafana-agent-logs-xxlxs              0/2     Pending   0          3m34s
k8s-monitoring-kube-state-metrics-556fd97bdd-g8msd   1/1     Running   0          3m34s
k8s-monitoring-prometheus-node-exporter-cb5wh        1/1     Running   0          3m34s
k8s-monitoring-prometheus-node-exporter-cstmr        1/1     Running   0          3m34s
k8s-monitoring-prometheus-node-exporter-hlknf        1/1     Running   0          3m34s

$ kubectl get nodes
NAME                                  STATUS   ROLES    AGE     VERSION
fargate-ip-XXX.ec2.internal   Ready    <none>   10m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   27h     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   17d     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal      Ready    <none>   6d4h    v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   10m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal     Ready    <none>   40m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal    Ready    <none>   13d     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   40m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   27h     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   7d1h    v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   40m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal    Ready    <none>   17d     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal    Ready    <none>   4d21h   v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal   Ready    <none>   40m     v1.28.5-eks-680e576
fargate-ip-XXX.ec2.internal    Ready    <none>   14d     v1.28.5-eks-680e576
ip-XXX.ec2.internal            Ready    <none>   82d     v1.28.5-eks-5e0fdde
ip-XXX.ec2.internal           Ready    <none>   41d     v1.28.5-eks-5e0fdde
ip-XXX.ec2.internal           Ready    <none>   82d     v1.28.5-eks-5e0fdde

$ kubectl describe node fargate-ip-XXX.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/compute-type=fargate
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1c
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=fargate-ip-XXX.ec2.internal
                    kubernetes.io/os=linux
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1c
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
Taints:             eks.amazonaws.com/compute-type=fargate:NoSchedule
Unschedulable:      false

ehddnko commented 7 months ago

@khaykingleb I think your indent of controller is incorrect.

grafana-agent-logs:
  agent:
    controller:

should be:

grafana-agent-logs:
  agent:
  controller:

Because indent of controller in grafana-agent helm chart is:

agent:
  ...
controller:
  ...

khaykingleb commented 7 months ago

Oh, indeed. Thank you for pointing this out!

morganwalker commented 7 months ago

In the same vein, we need:

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: eks.amazonaws.com/compute-type
                operator: NotIn
                values:
                - fargate

supported on the profiles daemonset as well. I don't see an option under https://github.com/grafana/k8s-monitoring-helm/blob/main/charts/k8s-monitoring/values.yaml#L832.

AlissonRS commented 3 months ago

@petewall seems like chart changed a bit since this was originally opened as grafana-agent-logs: doesn't seem to exist in the chart anymore.

EDIT: I changed it to alloy-logs and it worked.

I tried adding it on alloy_logs: but doesn't seem to work:

    cluster:
      name: ${var.cluster_name}
    alloy_logs:
      controller:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: eks.amazonaws.com/compute-type
                      operator: NotIn
                      values: 
                        - fargate

grafana / k8s-monitoring-helm

Add an option for preventing scheduling of daemonsets on Fargate #350