Closed 219980 closed 6 months ago
Attaching the screen shots here
@219980 I've went and transferred the logs that were in the comments, into this file kubecost-cost-analyzer.log
@219980 Thanks for reporting. Is this your first time installing Kubecost? Or are you upgrading from a previous version. If so, could you specify which version?
Additionally, is the data in Kubecost populating correctly? Is it just Grafana that is the issue?
@thomasvn We have not done any upgrade but installed kubecost first time using helm chart and version is :
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION kubecost kubecost 1 2023-06-09 20:04:44.905652 +0530 IST deployed cost-analyzer-1.104.0 1.104.0
we can see the some charts populating but some are not .. There is only one pod which is doing this work "kubecost-cost-analyzer" and inside that 2 containers are there 1. cost-model and 2. cost-analyzer-frontend Out of that data of cost-model is useful but its difficult to understand to find out the errros which is causing this as there is only pod (cost-model) which is doing this work and logs are huge . If you have some insights on this , pls share. Grafana is working fine because the prometheus data source is working good ... Pls let me know if u need any info
@219980 Thanks for that context. To debug the Grafana dashboards, I would recommend inspecting the query that the dashboard is making (screenshot attached).
Then, port-forward into your Prometheus server (docs ref) and try running the same query to validate that the data exists.
@thomasvn I checked couple of dashboards for example : "Attached Disk Metrics" but cant see any data/rows in prometheus server as well, Attaching the SS here
Query from Prometheus Sever for "Disk Size" Chart sum(container_fs_limit_bytes{instance=~'$disk', device!="tmpfs", id="/", cluster_id=~'$cluster'}) by (cluster_id, instance)
Query from Prometheus Sever for "Disk Utilization" Chart : sum(container_fs_usage_bytes{instance=~'$disk',id="/", cluster_id=~'$cluster'}) by (cluster_id, instance) / sum(container_fs_limit_bytes{instance=~'$disk',device!="tmpfs", id="/", cluster_id=~'$cluster'}) by (cluster_id,instance)
It means data is not getting populated in Prometheus TSDB , what will be the next course of action ?
@219980 Note, that the $
in those queries are referencing Grafana variables which won't be present when you're directly querying prometheus. Try removing those variables and querying again. Example below.
@thomasvn I tried as per your suggestion , but result is same i.e. there are no results
Not sure , why its happening , any further troubleshooting is required ?
@219980 Can you click "Graph" instead of "Table" to see if you have any historical data for container_fs_limit_bytes
? Can you also try removing all query parameters so that you are just querying container_fs_usage_bytes{}
and container_fs_limit_bytes{}
.
Assuming it doesn't exist, this may mean that your Prometheus instance is not scraping a cAdvisor metric. First place to double check this is by going to Prometheus's Status > Targets
and looking for the "kubernetes-nodes-cadvisor" target. (screenshot attached)
@thomasvn I checked for both "container_fs_usage_bytes" and "container_fs_limit_bytes" , found that no data is shown from prometheus server URL As per your suggestion in looked at the targets section of my prometheus server but couldn't find "cAdvisor metric" . attaching the targets i can see at my end :
Is anything missing from my side , pls suggest
@kirbsauce Is there anyone on support who can help look into this issue with @thomasvn out?
@219980 Is the screen shot you shared from the Prometheus server that comes bundled with Kubecost? It does not appear to include the expected kubecost*
targets. If you are using an existing Prometheus server please review this documentation for configuring the required scrapes: https://docs.kubecost.com/install-and-configure/install/custom-prom
It also looks as if you are missing cAdvisor metrics. What version/flavor of Kubernetes are you running on?
@jcharcalla and @thomasvn I am having separate prometheus solution installed on my cluster using "kube-prometheus-stack-44.3.0"
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION prometheus prometheus 72 2023-07-12 12:40:21.249259388 +0000 UTC deployed kube-prometheus-stack-44.3.0 v0.62.0
So with the current kube-cost helm chart with no promtheus installed with it .. so as per the documentation https://docs.kubecost.com/install-and-configure/install/custom-prom point no #3 , Added the below parameters to values.yaml file on my current prometheus helm chart installation +++++++++++++++++++++++++++++++++++
additionalScrapeConfigs:
job_name: kubecost honor_labels: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: http dns_sd_configs:
job_name: prometheus static_configs:
#
labelmap
relabeling action.job_name: 'kubernetes-nodes-cadvisor'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
# This configuration will work only on kubelet 1.7.3+
# As the scrape endpoints for cAdvisor have changed
# if you are using older version you need to change the replacement to
# replacement: /api/v1/nodes/$1:4194/proxy/metrics
# more info here https://github.com/coreos/prometheus-operator/issues/633
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: (container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_network_receive_errors_total|container_network_transmit_errors_total|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total|container_memory_usage_bytes|container_cpu_cfs_throttled_periods_total|container_cpu_cfs_periods_total|container_fs_usage_bytes|container_fs_limit_bytes|container_cpu_cfs_periods_total|container_fs_inodes_free|container_fs_inodes_total|container_fs_usage_bytes|container_fs_limit_bytes|container_cpu_cfs_throttled_periods_total|container_cpu_cfs_periods_total|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_fs_inodes_free|container_fs_inodes_total|container_fs_usage_bytes|container_fs_limit_bytes|container_spec_cpu_shares|container_spec_memory_limit_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_fs_reads_bytes_total|container_network_receive_bytes_total|container_fs_writes_bytes_total|container_fs_reads_bytes_total|cadvisor_version_info|kubecost_pv_info)
action: keep
- source_labels: [ container ]
target_label: container_name
regex: (.+)
action: replace
- source_labels: [ pod ]
target_label: pod_name
regex: (.+)
action: replace
#
labelmap
relabeling action.job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: (kubelet_volume_stats_used_bytes) # this metric is in alpha
action: keep
#
#
prometheus.io/scrape
: Only scrape services that have a value of true
prometheus.io/scheme
: If the metrics endpoint is secured then you will needhttps
& most likely set the tls_config
of the scrape config.prometheus.io/path
: If the metrics path is not /metrics
override this.prometheus.io/port
: If the metrics are exposed on a different port to thejob_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_endpoints_name]
action: keep
regex: (.*kube-state-metrics|.*prometheus-node-exporter|kubecost-network-costs)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: (container_cpu_allocation|container_cpu_usage_seconds_total|container_fs_limit_bytes|container_fs_writes_bytes_total|container_gpu_allocation|container_memory_allocation_bytes|container_memory_usage_bytes|container_memory_working_set_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total|DCGM_FI_DEV_GPU_UTIL|deployment_match_labels|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_ready|kube_deployment_spec_replicas|kube_deployment_status_replicas|kube_deployment_status_replicas_available|kube_job_status_failed|kube_namespace_annotations|kube_namespace_labels|kube_node_info|kube_node_labels|kube_node_status_allocatable|kube_node_status_allocatable_cpu_cores|kube_node_status_allocatable_memory_bytes|kube_node_status_capacity|kube_node_status_capacity_cpu_cores|kube_node_status_capacity_memory_bytes|kube_node_status_condition|kube_persistentvolume_capacity_bytes|kube_persistentvolume_status_phase|kube_persistentvolumeclaim_info|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_limits_cpu_cores|kube_pod_container_resource_limits_memory_bytes|kube_pod_container_resource_requests|kube_pod_container_resource_requests_cpu_cores|kube_pod_container_resource_requests_memory_bytes|kube_pod_container_status_restarts_total|kube_pod_container_status_running|kube_pod_container_status_terminated_reason|kube_pod_labels|kube_pod_owner|kube_pod_status_phase|kube_replicaset_owner|kube_statefulset_replicas|kube_statefulset_status_replicas|kubecost_cluster_info|kubecost_cluster_management_cost|kubecost_cluster_memory_working_set_bytes|kubecost_load_balancer_cost|kubecost_network_internet_egress_cost|kubecost_network_region_egress_cost|kubecost_network_zone_egress_cost|kubecost_node_is_spot|kubecost_pod_network_egress_bytes_total|node_cpu_hourly_cost|node_cpu_seconds_total|node_disk_reads_completed|node_disk_reads_completed_total|node_disk_writes_completed|node_disk_writes_completed_total|node_filesystem_device_error|node_gpu_count|node_gpu_hourly_cost|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_network_transmit_bytes_total|node_ram_hourly_cost|node_total_hourly_cost|pod_pvc_allocation|pv_hourly_cost|service_selector_labels|statefulSet_match_labels|kubecost_pv_info|up)
action: keep
#
#
prometheus.io/scrape-slow
: Only scrape services that have a value of true
prometheus.io/scheme
: If the metrics endpoint is secured then you will needhttps
& most likely set the tls_config
of the scrape config.prometheus.io/path
: If the metrics path is not /metrics
override this.prometheus.io/port
: If the metrics are exposed on a different port to thejob_name: 'kubernetes-service-endpoints-slow'
scrape_interval: 5m
scrape_timeout: 30s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
#
#
prometheus.io/probe
: Only probe services that have a value of true
job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
+++++++++++++++++++++++ After doing the helm upgrade for my "kube-prometheus-stack-44.3.0" helm chart i am getting the errors for "kubernetes-nodes" and ""kubernetes-nodes-cadvisor" chart as below
Can you please guide me what's causing this error ?
@thomasvn and @jcharcalla I am able to see all the charts now except "/kubecost-networkcosts-metrics". I followed the documentation from here "https://docs.kubecost.com/install-and-configure/advanced-configuration/network-costs-configuration" particularly this section:
Prometheus: If using Kubecost-bundled Prometheus instance, the scrape is automatically configured. If you are integrating with an existing Prometheus, you can set networkCosts.prometheusScrape=true and the network costs service should be auto-discovered. Alternatively, a serviceMonitor is also available.
I have installed separately "kube-prometheus-stack" so added the " networkCosts.prometheusScrape=true" under prometheus section of values.yaml file for ""kube-prometheus-stack helm chart and upgrade the helm chart . Also enabled the service monitors of "networkcosts" on "kubecost" helm chart by enabling below values ...
networkCosts: enabled: true podSecurityPolicy: enabled: false image: gcr.io/kubecost1/kubecost-network-costs:v0.16.6 imagePullPolicy: Always updateStrategy: type: RollingUpdate
serviceMonitor: # the kubecost included prometheus uses scrapeConfigs and does not support service monitors. The following options assume an existing prometheus that supports serviceMonitors. enabled: true additionalLabels: {} metricRelabelings: [] relabelings: [] networkCosts: enabled: true scrapeTimeout: 10s additionalLabels: {} metricRelabelings: [] relabelings: []
After adding i successfully upgrade the kubecost chart but i cant see the "kubecost-networking" target discovered on my Prometehus and hence no data is getting populated in grafana .. Attaching the SS of both here
Pls help me if i am configuring anything wrong here
@219980 Glad to see that you've got most of the scrapeconfigs working! For network costs, I'd recommend adding the following scrapeconfig: https://github.com/kubecost/cost-analyzer-helm-chart/blob/f35dafab266994df123e6735ae0d7aadcfe8711f/cost-analyzer/values.yaml#L592-L599
@thomasvn and @jcharcalla Thanks for your help ..
Now I can see the Network costs showing data except 2 dashboards still having no data on it 1) Cross Region Data In the kubecost chart "values.yaml" file , there is section for cross-region [Cross Region contains a list of address/range that will be classified as non-internet egress from one region to another] my kubcost chart is installed in EastUS so not sure which address range i should enter here..
and 2) Cross Zone Data : For this i have to enter in which section and what are the guidelines for that
Another thing is that 2 charts from "label-costs-and-utilization" dashboards are giving errors
1) CPU Usage vs Requests vs Limits
Error: execution: found duplicate series for the match group {pod="cloud-node-manager-lxp6j"} on the right hand-side of the operation: [{pod="cloud-node-manager-lxp6j"}, {container="kube-state-metrics", pod="cloud-node-manager-lxp6j"}];many-to-many matching not allowed: matching labels must be unique on one side
2) Memory Usage vs Requests vs Limits
Error: execution: found duplicate series for the match group {pod="calico-kube-controllers-866fc9cccd-p2qfx"} on the right hand-side of the operation: [{pod="calico-kube-controllers-866fc9cccd-p2qfx"}, {container="kube-state-metrics", pod="calico-kube-controllers-866fc9cccd-p2qfx"}];many-to-many matching not allowed: matching labels must be unique on one side
Attaching the Snanshot of all the above errors ,kindly guide after fixing those all the "kubecost" dashboards will be operational
@219980 For network costs metrics, it is preferred that you use https://kubecost.my.com/network to view your data instead of the Grafana dashboards. Please disregard those remaining Grafana dashboards for now.
@thomasvn I am not able to see anything on the URL "https://kubecost.my.com/network" as suggested by you , can you elaborate more how can i access "networkcost Metrics"
@219980 Can you try appending /network
or /network-cost.html
to your Kubecost URL?
Thanks @thomasvn , i tried by appending /network or /network-cost.html to my Kubecost URL but it is giving "404 Page not found" error.
Is there any workaround ?
@219980 Those screenshots show the Grafana dashboard, not the Kubecost UI. You can visit the Kubecost UI by following these instructions: https://www.kubecost.com/install
@219980 Where did you setup these dashboards?
Stale, also looks like this is a non-issue. Closing.
Describe the bug
I have setup kubecost using helm chart ( CHART: cost-analyzer-1.104.0 ) After the UI is up, I don't see that most of the grafana dashboards shows empty data. Can you please help how to debug these
Expected behavior
I can't see data populated for the below dashboards
kubecost-networkcosts-metrics some charts of kubecost-cluster-metrics some charts of cluster-cost-and-utilization-metrics node-utilization-metric namespace-utilization-metrics
Screenshots
I will attach the Screenshots here to state the problem
What impact will this have on your ability to get value out of Kubecost? Not able to see the data so not able to use the kubecost with its all chrarts and featured