kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

Enable Service in Descheduler without ClusterIP as None - Helm Chart #1437

Open jmk47912204 opened 2 weeks ago

jmk47912204 commented 2 weeks ago

I would like to raise this concern, we wanted to enable metrics for descheduler but without prometheus installed in our cluster at all. Why? Because we are using datadog as a observability tool and it's required to scrape the metrics from descheduler service metrics endpoint which is not available because the clusterIP of service is None and due to this datadog are not able to scrape this endpoint

I have shared all the logs and configuration details here and other github issues references as well

Descheduler Version: 0.30.0 GKE version: 1.28.3

Thank you

a7i commented 2 weeks ago

Hi @jmk47912204 how are you defining it?

This is how I've defined it for Datadog using PodSpec annotations

kind: Deployment
...
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/descheduler.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "https://%%host%%:10258/metrics",
                  "namespace": "descheduler",
                  "metrics": [
                    "descheduler_pods_evicted",
                    { "descheduler_descheduler_loop_duration_seconds": "descheduler_loop_duration_seconds" },
                    { "descheduler_descheduler_strategy_duration_seconds": "descheduler_strategy_duration_seconds" }
                  ],
                  "collect_histogram_buckets": true,
                  "histogram_buckets_as_distributions": true,
                  "tls_ca_cert": false,
                  "tls_verify": false,
                  "tls_ignore_warning": true,
                  "tags": [
                    "service:descheduler"
                  ]
                }
              ]
            }
          }

This instructs datadog to scrape the metrics from the Pod/Container. You want to take this approach in case you run Descheduler in high-availability mode (2 pods), and in that scenario, the Service of type ClusterIP will do it round-robin, leading the incomplete results, given that one may not be a leader.

a7i commented 2 weeks ago

/kind support

jmk47912204 commented 2 weeks ago

Hey @a7i

Thanks for the response. Actually, we are using the above configuration already in our deployment and we ignore some other configurations since it's not required to achieve our goal/requirement

So basically datadog agent is running as operator with daemonset in our cluster and in-order to scrape the metrics from descheduler into datadog, we need to have service which need to run as cluster IP assigned to the service only then it scrapes the metrics.

Is there any specific reason why descheduler svc shouldn't have clusterIP because we have lot of products are running in our cluster in which those we don't have clusterIP: None which means datadog agent scrapes the metrics from our product

You can find the datadog agent logs as well here