kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.5k stars 672 forks source link

Enable Service in Descheduler without ClusterIP as None - Helm Chart #1437

Closed jmk47912204 closed 2 weeks ago

jmk47912204 commented 5 months ago

I would like to raise this concern, we wanted to enable metrics for descheduler but without prometheus installed in our cluster at all. Why? Because we are using datadog as a observability tool and it's required to scrape the metrics from descheduler service metrics endpoint which is not available because the clusterIP of service is None and due to this datadog are not able to scrape this endpoint

I have shared all the logs and configuration details here and other github issues references as well

Descheduler Version: 0.30.0 GKE version: 1.28.3

Thank you

a7i commented 5 months ago

Hi @jmk47912204 how are you defining it?

This is how I've defined it for Datadog using PodSpec annotations

kind: Deployment
...
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/descheduler.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "https://%%host%%:10258/metrics",
                  "namespace": "descheduler",
                  "metrics": [
                    "descheduler_pods_evicted",
                    { "descheduler_descheduler_loop_duration_seconds": "descheduler_loop_duration_seconds" },
                    { "descheduler_descheduler_strategy_duration_seconds": "descheduler_strategy_duration_seconds" }
                  ],
                  "collect_histogram_buckets": true,
                  "histogram_buckets_as_distributions": true,
                  "tls_ca_cert": false,
                  "tls_verify": false,
                  "tls_ignore_warning": true,
                  "tags": [
                    "service:descheduler"
                  ]
                }
              ]
            }
          }

This instructs datadog to scrape the metrics from the Pod/Container. You want to take this approach in case you run Descheduler in high-availability mode (2 pods), and in that scenario, the Service of type ClusterIP will do it round-robin, leading the incomplete results, given that one may not be a leader.

a7i commented 5 months ago

/kind support

jmk47912204 commented 5 months ago

Hey @a7i

Thanks for the response. Actually, we are using the above configuration already in our deployment and we ignore some other configurations since it's not required to achieve our goal/requirement

So basically datadog agent is running as operator with daemonset in our cluster and in-order to scrape the metrics from descheduler into datadog, we need to have service which need to run as cluster IP assigned to the service only then it scrapes the metrics.

Is there any specific reason why descheduler svc shouldn't have clusterIP because we have lot of products are running in our cluster in which those we don't have clusterIP: None which means datadog agent scrapes the metrics from our product

You can find the datadog agent logs as well here

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 weeks ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/1437#issuecomment-2466160063): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.