jmcgrath207 / k8s-ephemeral-storage-metrics

Prometheus ephemeral storage metrics exporter
https://jmcgrath207.github.io/k8s-ephemeral-storage-metrics/
MIT License
85 stars 35 forks source link

Removed Nodes still emitting metrics #64

Closed marquesj2-ppb closed 5 months ago

marquesj2-ppb commented 5 months ago

Hey πŸ‘‹

Seeing a strange behaviour when deploying this application.

Node is removed, but keeps pushing metrics that get picked up by prometheus.

{"level":"info","time":1710357138,"message":"Starting server listening on :9100"}
{"level":"info","time":1710357378,"message":"Node ip-10-159-137-250.eu-west-1.compute.internal does not exist. Removing from monitoring"}
{"level":"warn","time":1710357408,"message":"Failed fetched proxy stats from node : ip-10-159-137-250.eu-west-1.compute.internal"}
{"level":"warn","time":1710357408,"message":"Could not query node: ip-10-159-137-250.eu-west-1.compute.internal. Skipping.."}
{"level":"warn","time":1710357423,"message":"Failed fetched proxy stats from node : ip-10-159-137-250.eu-west-1.compute.internal"}
{"level":"warn","time":1710357423,"message":"Could not query node: ip-10-159-137-250.eu-west-1.compute.internal. Skipping.."}
{"level":"warn","time":1710357438,"message":"Failed fetched proxy stats from node : ip-10-159-137-250.eu-west-1.compute.internal"}
{"level":"warn","time":1710357438,"message":"Could not query node: ip-10-159-137-250.eu-west-1.compute.internal. Skipping.."}
image
jmcgrath207 commented 5 months ago

Well, that's not good. To confirm, are you on the latest version?

Also, what metric are you querying ip-10-159-137-250.eu-west-1.compute.internal in the first picture?

There seems to be an issue around this part of the code. I will try to replicate it with Kind.

https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/blob/master/main.go#L321

marquesj2-ppb commented 5 months ago

Yes, installed yesterday in deployment mode.

ephemeral_storage_node_percentage on the first picture. Node capacity and node available also have the same behaviour, but didn't include it in the print.

Tried running it as daemonset but couldn't get the metric scrapping to work, will try to take a look closely today on that.

jmcgrath207 commented 5 months ago

So I can't replicate this issue with Kind since I can't scale nodes. However, I can scale up and down in Minikube.

Once I get CI testing working for Minikube, I will investigate this issue again.

jmcgrath207 commented 5 months ago

@marquesj2-ppb I have fixed this bug and added some e2e testing around it in the new release.

https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/releases/tag/1.6.0

Let me know how it goes

marquesj2-ppb commented 5 months ago

I'll try it out next week and let you know! Thanks a lot for the quick fix!

lozbrown commented 5 months ago

Can confirm, 1.6 fixes the issue

jmcgrath207 commented 5 months ago

Thanks for testing @lozbrown

Closing this issue @marquesj2-ppb, but feel free to re-ping me if you still see this issue.