[Bug] costs are showing billions

aeggerd commented 9 months ago

Kubecost Version

2.1.0 + 2.1.1

Kubernetes Version

v1.26.7+rke2r1

Kubernetes Platform

Other (specify in description)

Description

Using on Premise Kubernetes based on Rancher

Since the upgrade to version 2.X.X we experience that the number of total costs is way to high. We are getting values like: Kubernetes Kosts: 7.42 Bio. $ Possible Monthly Savings: 3,08 Bio. $/mo

Steps to reproduce

this is the helm Values yaml that we are using in order to deploy kubecost:

global:
  prometheus:
    enabled: false
    fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
  grafana:
    enabled: false
    domainName: rancher-grafana.xxxx
    scheme: https
cost-analyzer:
  global:
    prometheus:
      enabled: false
      fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
    grafana:
      enabled: false
      domainName: rancher-grafana.xxxx
      scheme: https
  prometheus:
    kube-state-metrics:
      disabled: true
    nodeExporter:
      enabled: false
    serviceAccounts:
      nodeExporter:
        create: false
    kubeStateMetrics:
      enabled: false
  prometheusRule:
    enabled: true
  serviceMonitor:
    enabled: true
    networkCosts:
      enabled: false
  networkCosts:
    enabled: false

the kubecost config that we have after all is:

apiVersion: v1
data:
  kubecost-token: not-applied
  prometheus-alertmanager-endpoint: http://cost-analyzer-prometheus-server.default.svc
  prometheus-server-endpoint: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
kind: ConfigMap
metadata:
  labels:
    app: cost-analyzer
    app.kubernetes.io/instance: kubecost
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cost-analyzer
    argocd.argoproj.io/instance: kubecost-staging-mucre1
    helm.sh/chart: cost-analyzer-2.1.0
  name: kubecost-cost-analyzer
  namespace: kubecost

Those are some numbers that are getting recorded by the prometheus rule:

Expected behavior

the percentage numbers seems to be correct but the number of total costs is way to high.

Impact

No response

Screenshots

Kubecost with wrong cost metrics

prometheus recording rules:

Logs

No response

Slack discussion

No response

Troubleshooting

[X] I have read and followed the issue guidelines and this is a bug impacting only the Kubecost application.
[X] I have searched other issues in this repository and mine is not recorded.

AjayTripathy commented 8 months ago

Hi @aeggerd could you possible share the results of the metric node_total_hourly_cost ?

As well as a secreenshot of the Assets page to see what is driving this cost?

aeggerd commented 8 months ago

those numbers are definitely looks high:

# HELP node_total_hourly_cost node_total_hourly_cost Total node cost per hour
# TYPE node_total_hourly_cost gauge
node_total_hourly_cost{arch="amd64",instance="kubectrl01a",instance_type="rke2",node="kubectrl01a",provider_id="rke2://kubectrl01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01b",instance_type="rke2",node="kubectrl01b",provider_id="rke2://kubectrl01b",region=""} 2.71763839646976e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01c",instance_type="rke2",node="kubectrl01c",provider_id="rke2://kubectrl01c",region=""} 2.717649127106552e+08
node_total_hourly_cost{arch="amd64",instance="kubegateway01a",instance_type="rke2",node="kubegateway01a",provider_id="rke2://kubegateway01a",region=""} 1.6326061544289812e+07
node_total_hourly_cost{arch="amd64",instance="kubeworker01a",instance_type="rke2",node="kubeworker01a",provider_id="rke2://kubeworker01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01b",instance_type="rke2",node="kubeworker01b",provider_id="rke2://kubeworker01b",region=""} 2.717630348506068e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01c",instance_type="rke2",node="kubeworker01c",provider_id="rke2://kubeworker01c",region=""} 2.717654492432893e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01d",instance_type="rke2",node="kubeworker01d",provider_id="rke2://kubeworker01d",region=""} 1.788777011721537e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01e",instance_type="rke2",node="kubeworker01e",provider_id="rke2://kubeworker01e",region=""} 1.7941222081961483e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01f",instance_type="rke2",node="kubeworker01f",provider_id="rke2://kubeworker01f",region=""} 1.788778317587711e+10

for the individual containers, for me the CPU looks more or less plausible. but the RAM costs are there way to high:

but if I click on "view right sizing" the numbers again looks fine:

Let me know if you need further informations.

AjayTripathy commented 8 months ago

Hi @aeggerd I am guessing something is wrong with the node total RAM reporting. Can you share the results of kube_node_status_capacity_memory_bytes

aeggerd commented 8 months ago

those are the results from the metric:

in case it helps here are the metrics from k9s:

aeggerd commented 8 months ago

feels like the metric keeps increasing:

i still assume that it must be a bug in the version or within the setting we set during the deployment. Because we have this issue consistent in all of our deployed instances.

here are the prometheus metrics: kube_node_status_capacity_memory_bytes

node_total_hourly_cost

AjayTripathy commented 8 months ago

Hmm, those byte counts look reasonable. How about node_ram_hourly_cost ? Are you supplying any custom prices? If so, how?

aeggerd commented 8 months ago

those are the node_ram_hourly_cost

those are my price settings:

those are the settings from the configmap kubecost/kubecost-cost-analyzer:

aeggerd commented 8 months ago

I also now deleted the deployment, and all of his configurations. But still the issue occurs. The issue is occurring in all of our 7 clusters / installations version we are using is: 2.1.1

Maybe this screenshot is can give so some glue:

AjayTripathy commented 8 months ago

@aeggerd can you share a little bit more about how kubecost is being installed as well as any logs in the cost-model container?

AjayTripathy commented 7 months ago

@aeggerd I have another report of this on rancher kubernetes. Can you try to set custom prices enabled to true here? https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/values.yaml#L3159

aeggerd commented 7 months ago

@AjayTripathy i have now updated my values file according to your recommendation. so far it looks good, but i would wait some more days, if stays correct or if it summarize up again

if you are interested, this is the helm values that we are using:

  global:
    prometheus:
      enabled: false
      fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
    grafana:
      enabled: false
      domainName: rancher-grafana.kubectrl01.XXXXXXX
      scheme: https
  prometheus:
    kube-state-metrics:
      disabled: true
    nodeExporter:
      enabled: false
    serviceAccounts:
      nodeExporter:
        create: false
    kubeStateMetrics:
      enabled: false
  prometheusRule:
    enabled: true
  serviceMonitor:
    enabled: true
    networkCosts:
      enabled: false
  networkCosts:
    enabled: false
  kubecostProductConfigs:
    customPricesEnabled: true
    defaultModelPricing:
      enabled: true
      CPU: "28.0"
      spotCPU: "4.86"
      RAM: "3.09"
      spotRAM: "0.65"
      GPU: "693.50"
      spotGPU: "225.0"
      storage: "0.04"
      zoneNetworkEgress: "0.01"
      regionNetworkEgress: "0.01"
      internetNetworkEgress: "0.12"
  ingress:
    enabled: true

aeggerd commented 7 months ago

it is still working, with the customer pricing :)

kubecost / features-bugs