Closed aeggerd closed 7 months ago
Hi @aeggerd could you possible share the results of the metric
node_total_hourly_cost
?
As well as a secreenshot of the Assets page to see what is driving this cost?
those numbers are definitely looks high:
# HELP node_total_hourly_cost node_total_hourly_cost Total node cost per hour
# TYPE node_total_hourly_cost gauge
node_total_hourly_cost{arch="amd64",instance="kubectrl01a",instance_type="rke2",node="kubectrl01a",provider_id="rke2://kubectrl01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01b",instance_type="rke2",node="kubectrl01b",provider_id="rke2://kubectrl01b",region=""} 2.71763839646976e+08
node_total_hourly_cost{arch="amd64",instance="kubectrl01c",instance_type="rke2",node="kubectrl01c",provider_id="rke2://kubectrl01c",region=""} 2.717649127106552e+08
node_total_hourly_cost{arch="amd64",instance="kubegateway01a",instance_type="rke2",node="kubegateway01a",provider_id="rke2://kubegateway01a",region=""} 1.6326061544289812e+07
node_total_hourly_cost{arch="amd64",instance="kubeworker01a",instance_type="rke2",node="kubeworker01a",provider_id="rke2://kubeworker01a",region=""} 2.717651809769061e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01b",instance_type="rke2",node="kubeworker01b",provider_id="rke2://kubeworker01b",region=""} 2.717630348506068e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01c",instance_type="rke2",node="kubeworker01c",provider_id="rke2://kubeworker01c",region=""} 2.717654492432893e+08
node_total_hourly_cost{arch="amd64",instance="kubeworker01d",instance_type="rke2",node="kubeworker01d",provider_id="rke2://kubeworker01d",region=""} 1.788777011721537e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01e",instance_type="rke2",node="kubeworker01e",provider_id="rke2://kubeworker01e",region=""} 1.7941222081961483e+10
node_total_hourly_cost{arch="amd64",instance="kubeworker01f",instance_type="rke2",node="kubeworker01f",provider_id="rke2://kubeworker01f",region=""} 1.788778317587711e+10
for the individual containers, for me the CPU looks more or less plausible. but the RAM costs are there way to high:
but if I click on "view right sizing" the numbers again looks fine:
Let me know if you need further informations.
Hi @aeggerd I am guessing something is wrong with the node total RAM reporting. Can you share the results of
kube_node_status_capacity_memory_bytes
those are the results from the metric:
in case it helps here are the metrics from k9s:
feels like the metric keeps increasing:
i still assume that it must be a bug in the version or within the setting we set during the deployment. Because we have this issue consistent in all of our deployed instances.
here are the prometheus metrics:
kube_node_status_capacity_memory_bytes
node_total_hourly_cost
Hmm, those byte counts look reasonable. How about node_ram_hourly_cost ? Are you supplying any custom prices? If so, how?
those are the node_ram_hourly_cost
those are my price settings:
those are the settings from the configmap kubecost/kubecost-cost-analyzer
:
I also now deleted the deployment, and all of his configurations. But still the issue occurs. The issue is occurring in all of our 7 clusters / installations version we are using is: 2.1.1
Maybe this screenshot is can give so some glue:
@aeggerd can you share a little bit more about how kubecost is being installed as well as any logs in the cost-model container?
@aeggerd I have another report of this on rancher kubernetes. Can you try to set custom prices enabled to true here? https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/values.yaml#L3159
@AjayTripathy i have now updated my values file according to your recommendation. so far it looks good, but i would wait some more days, if stays correct or if it summarize up again
if you are interested, this is the helm values that we are using:
global:
prometheus:
enabled: false
fqdn: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
grafana:
enabled: false
domainName: rancher-grafana.kubectrl01.XXXXXXX
scheme: https
prometheus:
kube-state-metrics:
disabled: true
nodeExporter:
enabled: false
serviceAccounts:
nodeExporter:
create: false
kubeStateMetrics:
enabled: false
prometheusRule:
enabled: true
serviceMonitor:
enabled: true
networkCosts:
enabled: false
networkCosts:
enabled: false
kubecostProductConfigs:
customPricesEnabled: true
defaultModelPricing:
enabled: true
CPU: "28.0"
spotCPU: "4.86"
RAM: "3.09"
spotRAM: "0.65"
GPU: "693.50"
spotGPU: "225.0"
storage: "0.04"
zoneNetworkEgress: "0.01"
regionNetworkEgress: "0.01"
internetNetworkEgress: "0.12"
ingress:
enabled: true
it is still working, with the customer pricing :)
Kubecost Version
2.1.0 + 2.1.1
Kubernetes Version
v1.26.7+rke2r1
Kubernetes Platform
Other (specify in description)
Description
Using on Premise Kubernetes based on Rancher
Since the upgrade to version 2.X.X we experience that the number of total costs is way to high. We are getting values like: Kubernetes Kosts: 7.42 Bio. $ Possible Monthly Savings: 3,08 Bio. $/mo
Steps to reproduce
this is the helm Values yaml that we are using in order to deploy kubecost:
the kubecost config that we have after all is:
Those are some numbers that are getting recorded by the prometheus rule:
Expected behavior
the percentage numbers seems to be correct but the number of total costs is way to high.
Impact
No response
Screenshots
Kubecost with wrong cost metrics
prometheus recording rules:
Logs
No response
Slack discussion
No response
Troubleshooting