prometheus-operator / kube-prometheus

Use Prometheus to monitor Kubernetes and applications running on Kubernetes
https://prometheus-operator.dev/
Apache License 2.0
6.66k stars 1.92k forks source link

prometheus-k8s-0 keeps terminating restarting #1660

Open wwdz opened 2 years ago

wwdz commented 2 years ago

k8s cluster version: 1.22 kube-prometheus version: 0.10.0 OS:centos7

The log error is as follows: kubectl logs -f prometheus-k8s-0 -n monitoring --all-containers

level=info ts=2022-02-25T16:36:34.191323313Z caller=main.go:147 msg="Starting prometheus-config-reloader" version="(version=0.43.2, branch=refs/tags/v0. 43.2, revision=b86ab77239f2a11ee69ad05b24122958d8b2df5b)" ts=2022-02-25T16:36:34.054Z caller=main.go:434 level=error msg="Error loading config (--config.file=/etc/prometheus/config_out/prometheus.env.yaml)" err ="open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory" level=info ts=2022-02-25T16:36:34.191441854Z caller=main.go:148 build_context="(go=go1.14.10, user=simonpasquier, date=20201109-10:56:57)" level=info ts=2022-02-25T16:36:34.191697484Z caller=main.go:182 msg="Starting web server for metrics" listen=:8080 level=error ts=2022-02-25T16:36:34.194758389Z caller=runutil.go:98 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http:// localhost:9090/-/reload\": dial tcp [::1]:9090: connect: connection refused"

ssista commented 2 years ago

Seeing same error on k8s cluster: 1.23.5 kube-prometheus version: 0.10.0 OS: Ubuntu20

jackyliusohu commented 2 years ago

K8s Server Version: v1.18.6 kube-prometheus version: kube-prometheus-0.8.0 OS: Centos 7.3

jackyliusohu commented 2 years ago

Port: 9090/TCP Host Port: 0/TCP Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=15d --web.enable-lifecycle --storage.tsdb.no-lockfile --web.route-prefix=/ State: Running Started: Mon, 16 May 2022 10:44:44 +0800 Last State: Terminated Reason: Error Message: "Start listening for connections" address=0.0.0.0:9090 level=info ts=2022-05-16T02:44:42.984Z caller=head.go:575 component=tsdb msg="replaying WAL, this may take awhile" level=info ts=2022-05-16T02:44:42.987Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0 level=info ts=2022-05-16T02:44:42.987Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=2.474151ms level=info ts=2022-05-16T02:44:42.987Z caller=main.go:683 fs_type=XFS_SUPER_MAGIC level=info ts=2022-05-16T02:44:42.987Z caller=main.go:684 msg="TSDB started" level=info ts=2022-05-16T02:44:42.987Z caller=main.go:788 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml level=info ts=2022-05-16T02:44:42.987Z caller=main.go:535 msg="Stopping scrape discovery manager..." level=info ts=2022-05-16T02:44:42.987Z caller=main.go:549 msg="Stopping notify discovery manager..." level=info ts=2022-05-16T02:44:42.987Z caller=main.go:571 msg="Stopping scrape manager..." level=info ts=2022-05-16T02:44:42.987Z caller=manager.go:875 component="rule manager" msg="Stopping rule manager..." level=info ts=2022-05-16T02:44:42.987Z caller=manager.go:885 component="rule manager" msg="Rule manager stopped" level=info ts=2022-05-16T02:44:42.987Z caller=main.go:531 msg="Scrape discovery manager stopped" level=info ts=2022-05-16T02:44:42.987Z caller=main.go:545 msg="Notify discovery manager stopped" level=info ts=2022-05-16T02:44:42.988Z caller=main.go:565 msg="Scrape manager stopped" level=info ts=2022-05-16T02:44:42.988Z caller=notifier.go:598 component=notifier msg="Stopping notification manager..." level=info ts=2022-05-16T02:44:42.988Z caller=main.go:738 msg="Notifier manager stopped" level=error ts=2022-05-16T02:44:42.988Z caller=main.go:747 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"

Barceloniak1 commented 1 year ago

Seeing same error on k8s version 1.20. Prometheus v2.36.2

justinmans commented 1 year ago

看起来官方并不维护这部分,没有看到官方人员支持

tomsherrod commented 1 year ago

I encountered this error today. Following the README, including the wait, corrected the issue for me.

Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources

Note that due to some CRD size we are using kubectl server-side apply feature which is generally available since kubernetes 1.22.

If you are using previous kubernetes versions this feature may not be available and you would need to use kubectl create instead.

kubectl apply --server-side -f manifests/setup kubectl wait \ --for condition=Established \ --all CustomResourceDefinition \ --namespace=monitoring kubectl apply -f manifests/