Open abhinavDhulipala opened 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
I met the similar problem and the CPU usage are shown like the screenshot. Then I found that my node-exporter worked well and so was all the pods except for prometheus-adapter not receiving metrics from node-exporter.
For me, the problem occurred after the HA cluster change the master node. Node-exporter did not change its settings for CoreDNS IP (it didn't change to new master). So I simply deleted the node-exporter pods and restarted it.
Describe the bug a clear and concise description of what the bug is.
Prometheus keeps intermittently failing and coming back online. Here is an example:
I also have a dev prometheus instance which gives me the following data.
This seems to only fail for certain metrics. It's also worth noting that this started failing all of a sudden and with no apparent cause. This is a subchart for a deployed tobs chart that I have deployed. The pod seems to be running and it's logs suddenly show the following
I don't see why this seems to be suddenly happening. I've also checked that my pvc's/pv's are in fine health as I have this deployed on a kadm administered local cluster.
What's your helm version?
version.BuildInfo{Version:"v3.10.1", GitCommit:"9f88ccb6aee40b9a0535fcc7efea6055e1ef72c9", GitTreeState:"clean", GoVersion:"go1.18.7"}
What's your kubectl version?
Client Version: v1.24.13 Kustomize Version: v4.5.4 Server Version: v1.24.0
Which chart?
kube-prometheus-stack
What's the chart version?
39.9.0
What happened?
Operation was normal until this suddenly started happening. We have been adding more scrape targets, but I don't think that should result in this behavior.
What you expected to happen?
Expected for operation to continue as normal
How to reproduce it?
Not completely sure
Enter the changed values of values.yaml?
These are close to the defaults for tobs 14.3.0
Enter the command that you execute and failing/misfunctioning.
Anything else we need to know?
This remote writes to promscale which has been deprecated. We are in the process of migrating away from this but it seems like problem lies with prometheus within this chart. Any attempt to upgrade individual images have resulted in the service becoming non-functional for a variety of reason. I'm more interested in figuring out the source of this man-to-many mismatch and why it suddenly started happening. Please let me know what other information I could provide