kubecost / cost-analyzer-helm-chart

Kubecost helm chart
http://kubecost.com/install
Apache License 2.0
488 stars 418 forks source link

Metrics not getting populated for most of the Kubecost dashboards #2375

Closed 219980 closed 6 months ago

219980 commented 1 year ago

Describe the bug

I have setup kubecost using helm chart ( CHART: cost-analyzer-1.104.0 ) After the UI is up, I don't see that most of the grafana dashboards shows empty data. Can you please help how to debug these

Expected behavior
I can't see data populated for the below dashboards

kubecost-networkcosts-metrics some charts of kubecost-cluster-metrics some charts of cluster-cost-and-utilization-metrics node-utilization-metric namespace-utilization-metrics

Screenshots
I will attach the Screenshots here to state the problem

What impact will this have on your ability to get value out of Kubecost? Not able to see the data so not able to use the kubecost with its all chrarts and featured

219980 commented 1 year ago

Attaching the screen shots here

node-utilization-metrics namespace-utilization-metrics cluster-cost-and-utilization-metrics cluster-cost-and-utilization-metrics1 kubecost-cluster-metrics
thomasvn commented 1 year ago

@219980 I've went and transferred the logs that were in the comments, into this file kubecost-cost-analyzer.log

thomasvn commented 1 year ago

@219980 Thanks for reporting. Is this your first time installing Kubecost? Or are you upgrading from a previous version. If so, could you specify which version?

Additionally, is the data in Kubecost populating correctly? Is it just Grafana that is the issue?

219980 commented 1 year ago

@thomasvn We have not done any upgrade but installed kubecost first time using helm chart and version is :

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION kubecost kubecost 1 2023-06-09 20:04:44.905652 +0530 IST deployed cost-analyzer-1.104.0 1.104.0

we can see the some charts populating but some are not .. There is only one pod which is doing this work "kubecost-cost-analyzer" and inside that 2 containers are there 1. cost-model and 2. cost-analyzer-frontend Out of that data of cost-model is useful but its difficult to understand to find out the errros which is causing this as there is only pod (cost-model) which is doing this work and logs are huge . If you have some insights on this , pls share. Grafana is working fine because the prometheus data source is working good ... Pls let me know if u need any info

thomasvn commented 1 year ago

@219980 Thanks for that context. To debug the Grafana dashboards, I would recommend inspecting the query that the dashboard is making (screenshot attached).

Screenshot 2023-06-23 at 9 59 09 AM

Then, port-forward into your Prometheus server (docs ref) and try running the same query to validate that the data exists.

219980 commented 1 year ago

@thomasvn I checked couple of dashboards for example : "Attached Disk Metrics" but cant see any data/rows in prometheus server as well, Attaching the SS here

attached_disk_metrics

Query from Prometheus Sever for "Disk Size" Chart sum(container_fs_limit_bytes{instance=~'$disk', device!="tmpfs", id="/", cluster_id=~'$cluster'}) by (cluster_id, instance)

Disk_Size

Query from Prometheus Sever for "Disk Utilization" Chart : sum(container_fs_usage_bytes{instance=~'$disk',id="/", cluster_id=~'$cluster'}) by (cluster_id, instance) / sum(container_fs_limit_bytes{instance=~'$disk',device!="tmpfs", id="/", cluster_id=~'$cluster'}) by (cluster_id,instance)

Disk_utilzation

It means data is not getting populated in Prometheus TSDB , what will be the next course of action ?

thomasvn commented 1 year ago

@219980 Note, that the $ in those queries are referencing Grafana variables which won't be present when you're directly querying prometheus. Try removing those variables and querying again. Example below.

Screenshot 2023-06-26 at 3 44 46 PM
219980 commented 1 year ago

@thomasvn I tried as per your suggestion , but result is same i.e. there are no results

Screenshot 2023-06-27 at 5 34 21 PM Screenshot 2023-06-27 at 5 34 39 PM

Not sure , why its happening , any further troubleshooting is required ?

thomasvn commented 1 year ago

@219980 Can you click "Graph" instead of "Table" to see if you have any historical data for container_fs_limit_bytes? Can you also try removing all query parameters so that you are just querying container_fs_usage_bytes{} and container_fs_limit_bytes{}.

Assuming it doesn't exist, this may mean that your Prometheus instance is not scraping a cAdvisor metric. First place to double check this is by going to Prometheus's Status > Targets and looking for the "kubernetes-nodes-cadvisor" target. (screenshot attached)

Screenshot 2023-06-27 at 8 47 12 AM
219980 commented 1 year ago

@thomasvn I checked for both "container_fs_usage_bytes" and "container_fs_limit_bytes" , found that no data is shown from prometheus server URL As per your suggestion in looked at the targets section of my prometheus server but couldn't find "cAdvisor metric" . attaching the targets i can see at my end :

Screenshot 2023-06-28 at 11 49 26 AM

Is anything missing from my side , pls suggest

kwombach12 commented 1 year ago

@kirbsauce Is there anyone on support who can help look into this issue with @thomasvn out?

jcharcalla commented 1 year ago

@219980 Is the screen shot you shared from the Prometheus server that comes bundled with Kubecost? It does not appear to include the expected kubecost* targets. If you are using an existing Prometheus server please review this documentation for configuring the required scrapes: https://docs.kubecost.com/install-and-configure/install/custom-prom

It also looks as if you are missing cAdvisor metrics. What version/flavor of Kubernetes are you running on?

219980 commented 1 year ago

@jcharcalla and @thomasvn I am having separate prometheus solution installed on my cluster using "kube-prometheus-stack-44.3.0"

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION prometheus prometheus 72 2023-07-12 12:40:21.249259388 +0000 UTC deployed kube-prometheus-stack-44.3.0 v0.62.0

So with the current kube-cost helm chart with no promtheus installed with it .. so as per the documentation https://docs.kubecost.com/install-and-configure/install/custom-prom point no #3 , Added the below parameters to values.yaml file on my current prometheus helm chart installation +++++++++++++++++++++++++++++++++++

additionalScrapeConfigs:

Screenshot 2023-07-13 at 11 29 18 AM Screenshot 2023-07-13 at 11 29 41 AM

Can you please guide me what's causing this error ?

219980 commented 1 year ago

@thomasvn and @jcharcalla I am able to see all the charts now except "/kubecost-networkcosts-metrics". I followed the documentation from here "https://docs.kubecost.com/install-and-configure/advanced-configuration/network-costs-configuration" particularly this section:

Prometheus: If using Kubecost-bundled Prometheus instance, the scrape is automatically configured. If you are integrating with an existing Prometheus, you can set networkCosts.prometheusScrape=true and the network costs service should be auto-discovered. Alternatively, a serviceMonitor is also available.

I have installed separately "kube-prometheus-stack" so added the " networkCosts.prometheusScrape=true" under prometheus section of values.yaml file for ""kube-prometheus-stack helm chart and upgrade the helm chart . Also enabled the service monitors of "networkcosts" on "kubecost" helm chart by enabling below values ...

networkCosts: enabled: true podSecurityPolicy: enabled: false image: gcr.io/kubecost1/kubecost-network-costs:v0.16.6 imagePullPolicy: Always updateStrategy: type: RollingUpdate

serviceMonitor: # the kubecost included prometheus uses scrapeConfigs and does not support service monitors. The following options assume an existing prometheus that supports serviceMonitors. enabled: true additionalLabels: {} metricRelabelings: [] relabelings: [] networkCosts: enabled: true scrapeTimeout: 10s additionalLabels: {} metricRelabelings: [] relabelings: []

After adding i successfully upgrade the kubecost chart but i cant see the "kubecost-networking" target discovered on my Prometehus and hence no data is getting populated in grafana .. Attaching the SS of both here

Screenshot 2023-07-14 at 5 59 37 PM Screenshot 2023-07-14 at 6 00 09 PM

Pls help me if i am configuring anything wrong here

thomasvn commented 1 year ago

@219980 Glad to see that you've got most of the scrapeconfigs working! For network costs, I'd recommend adding the following scrapeconfig: https://github.com/kubecost/cost-analyzer-helm-chart/blob/f35dafab266994df123e6735ae0d7aadcfe8711f/cost-analyzer/values.yaml#L592-L599

219980 commented 1 year ago

@thomasvn and @jcharcalla Thanks for your help ..

Now I can see the Network costs showing data except 2 dashboards still having no data on it 1) Cross Region Data In the kubecost chart "values.yaml" file , there is section for cross-region [Cross Region contains a list of address/range that will be classified as non-internet egress from one region to another] my kubcost chart is installed in EastUS so not sure which address range i should enter here..

and 2) Cross Zone Data : For this i have to enter in which section and what are the guidelines for that

Another thing is that 2 charts from "label-costs-and-utilization" dashboards are giving errors 1) CPU Usage vs Requests vs Limits
Error: execution: found duplicate series for the match group {pod="cloud-node-manager-lxp6j"} on the right hand-side of the operation: [{pod="cloud-node-manager-lxp6j"}, {container="kube-state-metrics", pod="cloud-node-manager-lxp6j"}];many-to-many matching not allowed: matching labels must be unique on one side

2) Memory Usage vs Requests vs Limits
Error: execution: found duplicate series for the match group {pod="calico-kube-controllers-866fc9cccd-p2qfx"} on the right hand-side of the operation: [{pod="calico-kube-controllers-866fc9cccd-p2qfx"}, {container="kube-state-metrics", pod="calico-kube-controllers-866fc9cccd-p2qfx"}];many-to-many matching not allowed: matching labels must be unique on one side

Attaching the Snanshot of all the above errors ,kindly guide after fixing those all the "kubecost" dashboards will be operational

Screenshot 2023-07-20 at 5 19 49 PM Screenshot 2023-07-20 at 5 31 11 PM Screenshot 2023-07-20 at 5 34 03 PM
thomasvn commented 1 year ago

@219980 For network costs metrics, it is preferred that you use https://kubecost.my.com/network to view your data instead of the Grafana dashboards. Please disregard those remaining Grafana dashboards for now.

219980 commented 1 year ago

@thomasvn I am not able to see anything on the URL "https://kubecost.my.com/network" as suggested by you , can you elaborate more how can i access "networkcost Metrics"

thomasvn commented 1 year ago

@219980 Can you try appending /network or /network-cost.html to your Kubecost URL?

219980 commented 1 year ago

Thanks @thomasvn , i tried by appending /network or /network-cost.html to my Kubecost URL but it is giving "404 Page not found" error.

Screenshot 2023-07-26 at 8 59 13 AM Screenshot 2023-07-26 at 8 59 37 AM

Is there any workaround ?

thomasvn commented 1 year ago

@219980 Those screenshots show the Grafana dashboard, not the Kubecost UI. You can visit the Kubecost UI by following these instructions: https://www.kubecost.com/install

rpriyanshu9 commented 7 months ago

@219980 Where did you setup these dashboards?

chipzoller commented 6 months ago

Stale, also looks like this is a non-issue. Closing.