lensapp / lens

Lens - The way the world runs Kubernetes
https://k8slens.dev/
MIT License
22.57k stars 1.47k forks source link

Fix pod metrics in Lens for Prometheus as they display incorrect value which 2 x times bigger then actual value #7679

Open dragoangel opened 1 year ago

dragoangel commented 1 year ago

What would you like to be added: Could you add an option that will allow passing custom query parameters for metrics requests?

Why is this needed: This is required for configuring aspects of monitoring, for example, pass timeout parameter to Prometheus. Also, not having the option to set query parameters when Lens is pointed to solutions like Thanos and Prometheus HA leading to displaying metrics wrongly. Lens will show duplicated data from HA, f.e.: pod CPU and RAM usage will be multiplied by the count of replicas in HA. To display data correctly and not fail on partial_response: dedup=1&partial_response=1 would help, but Lens does not accept query parameters in PROMETHEUS SERVICE ADDRESS unfortunately and does not have a separate field to add them.

Environment you are Lens application on:

dragoangel commented 1 year ago

Hm, after testing actually I found that reason of multiplied CPU and ram usage on pod compared of sum of containers not related to thanos usage and query params, because from what I tested:

  1. thanos deduplicate data by default
  2. on another setup without sharding and thanos issue reproduce in same way.

So this a bug. Will try debug tomorrow per query what is wrong https://github.com/lensapp/lens/blob/2b721ab1c9073f3876758e0a345026590f957198/packages/core/src/main/prometheus/operator-provider.injectable.ts.ts#L6

dragoangel commented 1 year ago

I found issue at https://github.com/lensapp/lens/blob/2b721ab1c9073f3876758e0a345026590f957198/packages/core/src/main/prometheus/operator-provider.injectable.ts.ts#L71-L95

Need to add container!=""

@Nokel81 sorry for bothering you, but can you please check this? Thank you in advance.

Nokel81 commented 1 year ago

I think we need to add a toggle for that, because we have added and then removed such a filter several times.

dragoangel commented 1 year ago

@Nokel81 you mean that previously there was such filter like sum(kube_pod_container_resource_limits{container!="", pod=~"${opts.pods}", resource="cpu", namespace="${opts.namespace}"}) by (${opts.selector}) but it was removed due to some issue?

P.s. in general adding the option to add custom query params would be nice feature, even that not as critical as I think about it initially when was creating this issue 😊

oleksandr-selezniov commented 1 year ago

Hi. I'm observing doubled metric plots for pods too. And I found out that container!="" won't help for me, as my cause of duplication is two datasets with different service in Prometheus output. Those services are "kubelet" and "prometeus-kube-prometheus-kubelet" And both datasets fit the condition container!=""

dragoangel commented 1 year ago

Those services are "kubelet" and "prometeus-kube-prometheus-kubelet"

I think this issue with your jobs in prometheus that you collect same metrics twice then. You need properly setup your prometheus stack.

vitaliyf commented 1 year ago

Same as https://github.com/lensapp/lens/issues/7679

Tantino commented 1 year ago

Same as https://github.com/lensapp/lens/issues/7679

jkroepke commented 1 year ago

Container metrics are fine for me while pod metrics are 2x. It may depends on the installed CRI. On a AKS installed, container!="" would help.

image

Running from AKS with kube-prometheus-stack.

dragoangel commented 1 year ago

Container metrics are fine for me while pod metrics are 2x. It may depends on the installed CRI. On a AKS installed, container!="" would help.

image

Running from AKS with kube-prometheus-stack.

Yes, this exactly what I mentioned

jkroepke commented 1 year ago

Looking forward to #7777

jkroepke commented 1 year ago

I can confirm that if I run minikube with docker driver, the metric container_memory_working_set_bytes does not have an container label.

It depends on the CRI runtime, if the metric container_memory_working_set_bytes does have the container metrics or not.

@Nokel81 That could be reason why user has some trouble in history.

If container_memory_working_set_bytes has a container label, then container!="". If not, then container!="" should be not present.

7777 would be the best solution for all.

dragoangel commented 1 year ago

I can confirm that if I run minikube with docker driver, the metric container_memory_working_set_bytes does not have an container label.

It depends on the CRI runtime, if the metric container_memory_working_set_bytes does have the container metrics or not.

@Nokel81 That could be reason why user has some trouble in history.

If container_memory_working_set_bytes has a container label, then container!="". If not, then container!="" should be not present.

7777 would be the best solution for all.

If you not have container label - having promql expr with {container!=""} will not break your query...

dragoangel commented 1 year ago

After latest update Lens (v2023.5.310801) per container metrics broken fully 99% of time :( all of CPU\RAM\Filesystem says: Metrics not available at the moment. Another problem is that downgrade Lens is not an easy thing to do...

dragoangel commented 1 year ago

Just for other people who struggle with latest version issues, I downgraded to 6.4.15 OpenLens to get stable monitoring tabs in Node view, Pod view, etc. All newer versions is not function properly. It anyway has issue described here, but at least other things not broken

After latest update Lens (v2023.5.310801) per container metrics broken fully 99% of time :( all of CPU\RAM\Filesystem says: Metrics not available at the moment. Another problem is that downgrade Lens is not an easy thing to do...

dragoangel commented 1 year ago

Any updates issue?

jkroepke commented 1 year ago

@dragoangel I have the feeling that Lens will now develop closed sourced. May not expect anything here.

dragoangel commented 1 year ago

@dragoangel I have the feeling that Lens will now develop closed sourced. May not expect anything here.

Yeah, you are totally right, as latest version of Lens is 2023.10.181418 and there are no releases here, which is bad :(. And in this version it's still same issues as was reported:

  1. x2 of metrics
  2. metrics for containers not available 99% of the time
dark-brains commented 3 months ago

Try to check your kubernetes services , I think there are services that duplicated. namespace: kube-system , services like kubelet