volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.07k stars 941 forks source link

[Question] Grafana 指标无数据 #3674

Closed kinoxyz1 closed 1 week ago

kinoxyz1 commented 4 weeks ago

Please describe your problem in detail

版本信息

  1. k8s version: 1.21.14
  2. volcano version: 1.7.0

问题: Grafana 版本大部分指标都无数据,如下图

image

经检查发现是Prometheus中根本不存在相关指标。以下是几种我尝试过的部署方式,请帮我看看是否存在问题。

尝试一: 按照release-1.7 tag 中的 installer/volcano-monitoring-v1.7.0.yaml 执行后,以如下看板为例

image

kube_pod_volcano_container_resource_requests 指标在Prometheus中并不存在

image

尝试二: 按照 master branch 中的 installer/volcano-monitoring-latest.yaml 执行后,还是如上指标不存在的问题。

尝试三: 按照kube-state-metrics项目中的步骤,执行kubectl apply -f examples/standard 之后,还是如上指标不存在的问题。

Any other relevant information

$ kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14", GitCommit:"0f77da5bd4809927e15d1658fb4aa8f13ad890a5", GitTreeState:"clean", BuildDate:"2022-06-15T14:17:29Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

$ uname -a Linux jz-desktop-08 5.4.0-150-generic #167~18.04.1-Ubuntu SMP Wed May 24 00:51:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

kinoxyz1 commented 4 weeks ago

这是 kube-state-metrics pod的完整日志信息

I0815 06:18:30.462470 1 main.go:95] Using default resources
--
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.462728 1 main.go:107] Using all namespace
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.462739 1 main.go:128] metric allow-denylisting: Excluding the following lists that were on denylist:
Thu, Aug 15 2024 2:18:30 pm | W0815 06:18:30.462751 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.463335 1 main.go:234] Testing communication with server
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.467603 1 main.go:239] Running with Kubernetes cluster version: v1.21. git version: v1.21.14. git tree state: clean. commit: 0f77da5bd4809927e15d1658fb4aa8f13ad890a5. platform: linux/amd64
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.467613 1 main.go:241] Communication with server successful
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.467801 1 main.go:197] Starting metrics server: 0.0.0.0:8080
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.467803 1 metrics_handler.go:96] Autosharding disabled
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.467806 1 main.go:186] Starting kube-state-metrics self metrics server: 0.0.0.0:8081
Thu, Aug 15 2024 2:18:30 pm | I0815 06:18:30.468248 1 builder.go:164] Active resources: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
Thu, Aug 15 2024 2:18:30 pm | W0815 06:18:30.469676 1 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Thu, Aug 15 2024 2:18:30 pm | W0815 06:18:30.470967 1 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Thu, Aug 15 2024 2:18:30 pm | W0815 06:18:30.473605 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Thu, Aug 15 2024 2:18:30 pm | W0815 06:18:30.474726 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Thu, Aug 15 2024 2:25:50 pm | W0815 06:25:50.475517 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Thu, Aug 15 2024 2:26:56 pm | W0815 06:26:56.471958 1 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Thu, Aug 15 2024 2:33:25 pm | W0815 06:33:25.472651 1 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Thu, Aug 15 2024 2:33:29 pm | W0815 06:33:29.476647 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Thu, Aug 15 2024 2:40:09 pm | W0815 06:40:09.473628 1 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Thu, Aug 15 2024 2:42:02 pm | W0815 06:42:02.477490 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Monokaix commented 4 weeks ago

Can you uninstall volcano and re-install volcano v1.9.0 and use installer/volcano-monitoring-latest.yaml to deploy monitoring in master branch to try once more?

kinoxyz1 commented 4 weeks ago

Can you uninstall volcano and re-install volcano v1.9.0 and use installer/volcano-monitoring-latest.yaml to deploy monitoring in master branch to try once more?

Do I have to upgrade? Upgrading may not guarantee the operation of online services. Or rather, I just want to successfully deploy a monitoring system. Do you have any other solutions?

Monokaix commented 4 weeks ago

Please also paste the prometheus pod logs.

kinoxyz1 commented 4 weeks ago

Please also paste the prometheus pod logs.

Okay, first of all, thank you for your reply. This is all the logs of Prometheus from startup to now:

ts=2024-08-15T08:08:21.559Z caller=main.go:589 level=info msg="No time or size retention was set so using the default time retention" duration=15d
--
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.559Z caller=main.go:633 level=info msg="Starting Prometheus Server" mode=server version="(version=2.53.2, branch=HEAD, revision=6e971a7dc905696d4bc4ffa150bf282fcfac5fa9)"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.559Z caller=main.go:638 level=info build_context="(go=go1.22.6, platform=linux/amd64, user=root@363b0aa99939, date=20240809-14:55:04, tags=netgo,builtinassets,stringlabels)"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.559Z caller=main.go:639 level=info host_details="(Linux 5.4.0-150-generic #167~18.04.1-Ubuntu SMP Wed May 24 00:51:42 UTC 2023 x86_64 prometheus-deployment-6468db6ddc-gdpgz (none))"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.559Z caller=main.go:640 level=info fd_limits="(soft=1048576, hard=1048576)"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.559Z caller=main.go:641 level=info vm_limits="(soft=unlimited, hard=unlimited)"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.561Z caller=web.go:568 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.561Z caller=main.go:1148 level=info msg="Starting TSDB ..."
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.562Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.562Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.563Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.563Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=803ns
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.563Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.563Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.563Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=16.607µs wal_replay_duration=500.625µs wbl_replay_duration=101ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=803ns total_replay_duration=530.767µs
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.565Z caller=main.go:1169 level=info fs_type=EXT4_SUPER_MAGIC
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.565Z caller=main.go:1172 level=info msg="TSDB started"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.565Z caller=main.go:1354 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=kubernetes-cadvisor msg="Using pod service account via in-cluster config"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=kubernetes-service-endpoints msg="Using pod service account via in-cluster config"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=kubernetes-pods msg="Using pod service account via in-cluster config"
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=main.go:1391 level=info msg="updated GOGC" old=100 new=75
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=main.go:1402 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=1.294952ms db_storage=617ns remote_storage=798ns web_handler=174ns query_engine=387ns scrape=151.926µs scrape_sd=426.727µs notify=9.488µs notify_sd=3.756µs rules=237.831µs tracing=2.791µs
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=main.go:1133 level=info msg="Server is ready to receive web requests."
Thu, Aug 15 2024 4:08:21 pm | ts=2024-08-15T08:08:21.566Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
Thu, Aug 15 2024 4:09:32 pm | ts=2024-08-15T08:09:32.499Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:10:37 pm | ts=2024-08-15T08:10:37.513Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:11:42 pm | ts=2024-08-15T08:11:42.479Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:12:47 pm | ts=2024-08-15T08:12:47.477Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:13:52 pm | ts=2024-08-15T08:13:52.485Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:14:57 pm | ts=2024-08-15T08:14:57.484Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:16:02 pm | ts=2024-08-15T08:16:02.490Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:17:07 pm | ts=2024-08-15T08:17:07.518Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:18:14 pm | ts=2024-08-15T08:18:14.485Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:19:17 pm | ts=2024-08-15T08:19:17.501Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:20:22 pm | ts=2024-08-15T08:20:22.481Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:21:27 pm | ts=2024-08-15T08:21:27.483Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:22:32 pm | ts=2024-08-15T08:22:32.569Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:23:37 pm | ts=2024-08-15T08:23:37.489Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:24:42 pm | ts=2024-08-15T08:24:42.477Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:25:47 pm | ts=2024-08-15T08:25:47.492Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:26:52 pm | ts=2024-08-15T08:26:52.481Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:27:57 pm | ts=2024-08-15T08:27:57.491Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:29:02 pm | ts=2024-08-15T08:29:02.485Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:30:07 pm | ts=2024-08-15T08:30:07.484Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:31:12 pm | ts=2024-08-15T08:31:12.483Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:32:17 pm | ts=2024-08-15T08:32:17.497Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:33:22 pm | ts=2024-08-15T08:33:22.577Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:34:27 pm | ts=2024-08-15T08:34:27.496Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:35:32 pm | ts=2024-08-15T08:35:32.493Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:36:37 pm | ts=2024-08-15T08:36:37.500Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:37:42 pm | ts=2024-08-15T08:37:42.486Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:38:47 pm | ts=2024-08-15T08:38:47.508Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:39:52 pm | ts=2024-08-15T08:39:52.476Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:40:57 pm | ts=2024-08-15T08:40:57.623Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:42:02 pm | ts=2024-08-15T08:42:02.490Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:43:07 pm | ts=2024-08-15T08:43:07.477Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:44:12 pm | ts=2024-08-15T08:44:12.899Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:45:18 pm | ts=2024-08-15T08:45:18.882Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:46:22 pm | ts=2024-08-15T08:46:22.487Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:47:29 pm | ts=2024-08-15T08:47:29.486Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:48:32 pm | ts=2024-08-15T08:48:32.487Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:49:37 pm | ts=2024-08-15T08:49:37.488Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:50:42 pm | ts=2024-08-15T08:50:42.484Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:51:47 pm | ts=2024-08-15T08:51:47.479Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:52:52 pm | ts=2024-08-15T08:52:52.475Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:53:57 pm | ts=2024-08-15T08:53:57.482Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:55:02 pm | ts=2024-08-15T08:55:02.485Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:56:07 pm | ts=2024-08-15T08:56:07.484Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:57:12 pm | ts=2024-08-15T08:57:12.479Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:58:17 pm | ts=2024-08-15T08:58:17.493Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 4:59:22 pm | ts=2024-08-15T08:59:22.500Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:00:27 pm | ts=2024-08-15T09:00:27.482Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:01:32 pm | ts=2024-08-15T09:01:32.484Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:02:37 pm | ts=2024-08-15T09:02:37.481Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:03:43 pm | ts=2024-08-15T09:03:43.838Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:04:47 pm | ts=2024-08-15T09:04:47.505Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:05:52 pm | ts=2024-08-15T09:05:52.507Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:06:57 pm | ts=2024-08-15T09:06:57.505Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:08:02 pm | ts=2024-08-15T09:08:02.485Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:09:07 pm | ts=2024-08-15T09:09:07.490Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:10:12 pm | ts=2024-08-15T09:10:12.488Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:11:17 pm | ts=2024-08-15T09:11:17.545Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:12:22 pm | ts=2024-08-15T09:12:22.491Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:13:27 pm | ts=2024-08-15T09:13:27.497Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:14:32 pm | ts=2024-08-15T09:14:32.483Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Thu, Aug 15 2024 5:15:38 pm | ts=2024-08-15T09:15:38.544Z caller=notifier.go:549 level=error component=notifier alertmanager=http://alertmanager.monitoring.svc:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager.monitoring.svc:9093/api/v2/alerts\": dial tcp: lookup alertmanager.monitoring.svc on 10.64.0.10:53: no such host"
Monokaix commented 3 weeks ago

Maybe we also take a look at grafana's logs. : )

kinoxyz1 commented 2 weeks ago

The problem has been solved, here are the steps to solve it:

  1. install volcano monitor
    kubectl apply -f installer/volcano-monitoring-latest.yaml

2.modify the metrics image version More detailed documentation reference: https://github.com/volcano-sh/kube-state-metrics

kubectl edit deploy prometheus-release-kube-state-metrics 
...
image: volcanosh/kube-state-metrics:v2.0.0-beta
...
  1. modify dashboard Mainly to see whether the indicators of grafana dashboard correspond to those of Prometheus image

Here is the complete display:

image
Monokaix commented 1 week ago

The kube-state-metrics image version in installer/volcano-monitoring-latest.yaml is already docker.io/volcanosh/kube-state-metrics:v2.0.0-beta : )

Monokaix commented 1 week ago

/close

volcano-sh-bot commented 1 week ago

@Monokaix: Closing this issue.

In response to [this](https://github.com/volcano-sh/volcano/issues/3674#issuecomment-2324508469): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.