Closed jmk47912204 closed 2 weeks ago
Hi @jmk47912204 how are you defining it?
This is how I've defined it for Datadog using PodSpec annotations
kind: Deployment
...
spec:
template:
metadata:
annotations:
ad.datadoghq.com/descheduler.checks: |
{
"openmetrics": {
"instances": [
{
"openmetrics_endpoint": "https://%%host%%:10258/metrics",
"namespace": "descheduler",
"metrics": [
"descheduler_pods_evicted",
{ "descheduler_descheduler_loop_duration_seconds": "descheduler_loop_duration_seconds" },
{ "descheduler_descheduler_strategy_duration_seconds": "descheduler_strategy_duration_seconds" }
],
"collect_histogram_buckets": true,
"histogram_buckets_as_distributions": true,
"tls_ca_cert": false,
"tls_verify": false,
"tls_ignore_warning": true,
"tags": [
"service:descheduler"
]
}
]
}
}
This instructs datadog to scrape the metrics from the Pod/Container. You want to take this approach in case you run Descheduler in high-availability mode (2 pods), and in that scenario, the Service of type ClusterIP will do it round-robin, leading the incomplete results, given that one may not be a leader.
/kind support
Hey @a7i
Thanks for the response. Actually, we are using the above configuration already in our deployment and we ignore some other configurations since it's not required to achieve our goal/requirement
So basically datadog agent is running as operator with daemonset in our cluster and in-order to scrape the metrics from descheduler into datadog, we need to have service which need to run as cluster IP assigned to the service only then it scrapes the metrics.
Is there any specific reason why descheduler svc shouldn't have clusterIP because we have lot of products are running in our cluster in which those we don't have clusterIP: None
which means datadog agent scrapes the metrics from our product
You can find the datadog agent logs as well here
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
I would like to raise this concern, we wanted to enable metrics for descheduler but without prometheus installed in our cluster at all. Why? Because we are using datadog as a observability tool and it's required to scrape the metrics from descheduler service metrics endpoint which is not available because the clusterIP of service is None and due to this datadog are not able to scrape this endpoint
I have shared all the logs and configuration details here and other github issues references as well
Descheduler Version: 0.30.0 GKE version: 1.28.3
Thank you