kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

[Bug] costs are showing billions #60

Closed aeggerd closed 7 months ago

aeggerd commented 9 months ago

Kubecost Version

2.1.0 + 2.1.1

Kubernetes Version

v1.26.7+rke2r1

Kubernetes Platform

Other (specify in description)

Description

Using on Premise Kubernetes based on Rancher

Since the upgrade to version 2.X.X we experience that the number of total costs is way to high. We are getting values like: Kubernetes Kosts: 7.42 Bio. $ Possible Monthly Savings: 3,08 Bio. $/mo

Steps to reproduce

this is the helm Values yaml that we are using in order to deploy kubecost:

global:
  prometheus:
    enabled: false
    fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
  grafana:
    enabled: false
    domainName: rancher-grafana.xxxx
    scheme: https
cost-analyzer:
  global:
    prometheus:
      enabled: false
      fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
    grafana:
      enabled: false
      domainName: rancher-grafana.xxxx
      scheme: https
  prometheus:
    kube-state-metrics:
      disabled: true
    nodeExporter:
      enabled: false
    serviceAccounts:
      nodeExporter:
        create: false
    kubeStateMetrics:
      enabled: false
  prometheusRule:
    enabled: true
  serviceMonitor:
    enabled: true
    networkCosts:
      enabled: false
  networkCosts:
    enabled: false

the kubecost config that we have after all is:

apiVersion: v1
data:
  kubecost-token: not-applied
  prometheus-alertmanager-endpoint: http://cost-analyzer-prometheus-server.default.svc
  prometheus-server-endpoint: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
kind: ConfigMap
metadata:
  labels:
    app: cost-analyzer
    app.kubernetes.io/instance: kubecost
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cost-analyzer
    argocd.argoproj.io/instance: kubecost-staging-mucre1
    helm.sh/chart: cost-analyzer-2.1.0
  name: kubecost-cost-analyzer
  namespace: kubecost

Those are some numbers that are getting recorded by the prometheus rule: image

image

Expected behavior

the percentage numbers seems to be correct but the number of total costs is way to high.

Impact

No response

Screenshots

Kubecost with wrong cost metrics image

prometheus recording rules: image

Logs

No response

Slack discussion

No response

Troubleshooting

AjayTripathy commented 8 months ago

Hi @aeggerd could you possible share the results of the metric node_total_hourly_cost ?

As well as a secreenshot of the Assets page to see what is driving this cost?

aeggerd commented 8 months ago

those numbers are definitely looks high: image

# HELP node_total_hourly_cost node_total_hourly_cost Total node cost per hour
# TYPE node_total_hourly_cost gauge
node_total_hourly_cost{arch="amd64",instance="kubectrl01a",instance_type="rke2",node="kubectrl01a",provider_id="rke2://kubectrl01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01b",instance_type="rke2",node="kubectrl01b",provider_id="rke2://kubectrl01b",region=""} 2.71763839646976e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01c",instance_type="rke2",node="kubectrl01c",provider_id="rke2://kubectrl01c",region=""} 2.717649127106552e+08
node_total_hourly_cost{arch="amd64",instance="kubegateway01a",instance_type="rke2",node="kubegateway01a",provider_id="rke2://kubegateway01a",region=""} 1.6326061544289812e+07
node_total_hourly_cost{arch="amd64",instance="kubeworker01a",instance_type="rke2",node="kubeworker01a",provider_id="rke2://kubeworker01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01b",instance_type="rke2",node="kubeworker01b",provider_id="rke2://kubeworker01b",region=""} 2.717630348506068e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01c",instance_type="rke2",node="kubeworker01c",provider_id="rke2://kubeworker01c",region=""} 2.717654492432893e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01d",instance_type="rke2",node="kubeworker01d",provider_id="rke2://kubeworker01d",region=""} 1.788777011721537e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01e",instance_type="rke2",node="kubeworker01e",provider_id="rke2://kubeworker01e",region=""} 1.7941222081961483e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01f",instance_type="rke2",node="kubeworker01f",provider_id="rke2://kubeworker01f",region=""} 1.788778317587711e+10

for the individual containers, for me the CPU looks more or less plausible. but the RAM costs are there way to high: image

but if I click on "view right sizing" the numbers again looks fine: image

Let me know if you need further informations.

AjayTripathy commented 8 months ago

Hi @aeggerd I am guessing something is wrong with the node total RAM reporting. Can you share the results of kube_node_status_capacity_memory_bytes

aeggerd commented 8 months ago

those are the results from the metric: image

in case it helps here are the metrics from k9s: image

aeggerd commented 8 months ago

feels like the metric keeps increasing: image

i still assume that it must be a bug in the version or within the setting we set during the deployment. Because we have this issue consistent in all of our deployed instances.

here are the prometheus metrics: kube_node_status_capacity_memory_bytes image image

node_total_hourly_cost image image

AjayTripathy commented 8 months ago

Hmm, those byte counts look reasonable. How about node_ram_hourly_cost ? Are you supplying any custom prices? If so, how?

aeggerd commented 8 months ago

those are the node_ram_hourly_cost image

those are my price settings: image

those are the settings from the configmap kubecost/kubecost-cost-analyzer: image

aeggerd commented 8 months ago

I also now deleted the deployment, and all of his configurations. But still the issue occurs. The issue is occurring in all of our 7 clusters / installations version we are using is: 2.1.1

Maybe this screenshot is can give so some glue: image

AjayTripathy commented 8 months ago

@aeggerd can you share a little bit more about how kubecost is being installed as well as any logs in the cost-model container?

AjayTripathy commented 7 months ago

@aeggerd I have another report of this on rancher kubernetes. Can you try to set custom prices enabled to true here? https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/values.yaml#L3159

aeggerd commented 7 months ago

@AjayTripathy i have now updated my values file according to your recommendation. so far it looks good, but i would wait some more days, if stays correct or if it summarize up again

if you are interested, this is the helm values that we are using:

  global:
    prometheus:
      enabled: false
      fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
    grafana:
      enabled: false
      domainName: rancher-grafana.kubectrl01.XXXXXXX
      scheme: https
  prometheus:
    kube-state-metrics:
      disabled: true
    nodeExporter:
      enabled: false
    serviceAccounts:
      nodeExporter:
        create: false
    kubeStateMetrics:
      enabled: false
  prometheusRule:
    enabled: true
  serviceMonitor:
    enabled: true
    networkCosts:
      enabled: false
  networkCosts:
    enabled: false
  kubecostProductConfigs:
    customPricesEnabled: true
    defaultModelPricing:
      enabled: true
      CPU: "28.0"
      spotCPU: "4.86"
      RAM: "3.09"
      spotRAM: "0.65"
      GPU: "693.50"
      spotGPU: "225.0"
      storage: "0.04"
      zoneNetworkEgress: "0.01"
      regionNetworkEgress: "0.01"
      internetNetworkEgress: "0.12"
  ingress:
    enabled: true
aeggerd commented 7 months ago

it is still working, with the customer pricing :)