kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

[Bug] Kubecost not showing the old metrics after some updates to allow the upgrade to EKS 1.25 #42

Open diegoduarte-gb opened 5 months ago

diegoduarte-gb commented 5 months ago

Kubecost Helm Chart Version

1.98

Kubernetes Version

1.25

Kubernetes Platform

EKS

Description

Hello folks!

After upgrading our EKS cluster to the 1.25, we noticed our kubecost stopped to work, due to some deprecations in the PSP features (as stated here https://github.com/kubecost/cost-analyzer-helm-chart/issues/1773#issuecomment-1299327504)

After some days research, we found that workaround and we decided to apply it. It worked, as expected and Kubecost is now back online.

However, we noticed that we lost our old metrics, and we got only new metrics from today and on. Checking the kubecost logs, I found the following in the logs (pasted in the log part)

After considering this, maybe our data is still here and somehow is not being read, or do we had another issue and we lost it, aftter pushing the helm chart with the updates to Argo?

thanks

Steps to reproduce

  1. Update cluster to 1.25
  2. Apply the last workaround stated here https://github.com/kubecost/cost-analyzer-helm-chart/issues/1773#issuecomment-1299327504
  3. Check that kubecost is back on, but missing the old metrics.

Expected behavior

The old metrics existing in the console

Impact

big - finops cant get any value from it

Screenshots

image

Logs

200
2024-01-10T18:38:52.833535279Z WRN CostModel.ComputeAllocation: Node spot  query result for missing node: cluster-one/fargate-ip-10-149-241-43.ec2.internal
199
2024-01-10T18:38:52.836968532Z INF ETL: Allocation[1h]: AggregatedStore[UDejW]: run: aggregated [2024-01-10T18:00:00+0000, 2024-01-10T19:00:00+0000) from 132 to 46 in 733.849µs
198
2024-01-10T18:38:52.840605258Z INF ETL: Asset[1h]: AggregatedStore.Run[sipig]: run: aggregated [2024-01-10T14:00:00+0000, 2024-01-10T15:00:00+0000) from 0 to 0 in 460ns
197
2024-01-10T18:38:52.878688631Z INF ETL: Asset[1h]: AggregatedStore.Run[sipig]: run: aggregated [2024-01-10T15:00:00+0000, 2024-01-10T16:00:00+0000) from 0 to 0 in 380ns
196
2024-01-10T18:38:52.918977102Z INF ETL: Asset[1h]: AggregatedStore.Run[sipig]: run: aggregated [2024-01-10T16:00:00+0000, 2024-01-10T17:00:00+0000) from 0 to 0 in 370ns
195
2024-01-10T18:38:52.94252342Z INF ETL: Asset[1h]: AggregatedStore.Run[sipig]: run: aggregated [2024-01-10T17:00:00+0000, 2024-01-10T18:00:00+0000) from 0 to 0 in 350ns
194
2024-01-10T18:38:52.994287847Z INF ETL: Asset[1h]: AggregatedStore.Run[sipig]: run: aggregated [2024-01-10T18:00:00+0000, 2024-01-10T19:00:00+0000) from 17 to 2 in 1.534649ms
193
2024-01-10T18:39:01.650451796Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
192
2024-01-10T18:39:01.650540317Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
191
2024-01-10T18:39:01.65316764Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
190
2024-01-10T18:39:01.653273281Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
189
2024-01-10T18:40:01.694239625Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
188
2024-01-10T18:40:01.694317886Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
187
2024-01-10T18:40:01.696444753Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
186
2024-01-10T18:40:01.696531594Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
185
2024-01-10T18:40:07.488728344Z INF http: named cookie not present
184
2024-01-10T18:40:07.488836665Z INF [JWT Groups] No Cookie set
183
2024-01-10T18:40:07.489532994Z INF ETL: Allocation: QueryAllocation([2024-01-03T18:40:07+0000, 2024-01-10T18:40:07+0000), [cluster]) from AggregatedStore[1d] 602.178µs [query 263.583µs] [idle/tenancy 480ns] [external 330ns] [aggregate 337.105µs] [accumulate 440ns] [stop 240ns]
182
2024-01-10T18:40:07.490493286Z INF http: named cookie not present
181
2024-01-10T18:40:07.490555817Z INF [JWT Groups] No Cookie set
180
2024-01-10T18:40:07.490567657Z INF http: named cookie not present
179
2024-01-10T18:40:07.490665469Z INF [JWT Groups] No Cookie set
178
2024-01-10T18:40:07.491098084Z INF ETL: QuerySummaryAllocation([2024-01-10T14:40:07+0000, 2024-01-10T18:40:07+0000), [namespace]) from AggregatedStore[1h] 464.366µs [query 322.865µs] [idle/tenancy 25.52µs] [external 380ns] [aggregate 114.971µs] [accumulate 400ns] [stop 230ns]
177
2024-01-10T18:40:07.49158431Z INF ETL: QuerySummaryAllocation([2024-01-07T18:40:07+0000, 2024-01-10T18:40:07+0000), [cluster]) from AggregatedStore[1d] 372.185µs [query 251.633µs] [idle/tenancy 18.641µs] [external 520ns] [aggregate 100.361µs] [accumulate 810ns] [stop 220ns]
176
2024-01-10T18:40:08.063251158Z INF http: named cookie not present
175
2024-01-10T18:40:08.06332345Z INF [JWT Groups] No Cookie set
174
2024-01-10T18:40:08.064087869Z INF ETL: QuerySummaryAllocation([2024-01-09T18:40:08+0000, 2024-01-10T18:40:08+0000), [namespace]) from AggregatedStore[1h] 658.648µs [query 362.115µs] [idle/tenancy 180.052µs] [external 640ns] [aggregate 115.231µs] [accumulate 390ns] [stop 220ns]
173
2024-01-10T18:40:08.06967862Z ERR ETL: failed to merge cloud usage: error merging cloud usage: MergeAssetSetRanges failed: expected range length 24, but got 1
172
2024-01-10T18:40:08.069887802Z INF ETL: Asset: QueryAsset([2024-01-09T18:40:08+0000, 2024-01-10T18:40:08+0000), [type]) from ETLStore[1h] 648.518µs [query 439.196µs] [cloud 121.551µs] [aggregate 86.951µs] [accumulate 510ns] [stop 310ns]
171
2024-01-10T18:40:08.079095549Z INF http: named cookie not present
170
2024-01-10T18:40:08.079168321Z INF [JWT Groups] No Cookie set
169
2024-01-10T18:40:08.079718088Z INF ETL: QuerySummaryAllocation([2024-01-09T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster]) from AggregatedStore[1h] 471.256µs [query 338.014µs] [idle/tenancy 54.551µs] [external 230ns] [aggregate 78.071µs] [accumulate 220ns] [stop 170ns]
168
2024-01-10T18:40:08.080747351Z INF http: named cookie not present
167
2024-01-10T18:40:08.080843862Z INF [JWT Groups] No Cookie set
166
2024-01-10T18:40:08.08073867Z INF ETL: Asset: QueryAsset([2024-01-03T18:40:08+0000, 2024-01-10T18:40:08+0000), [service]) from ETLStore[1d] 439.355µs [query 374.264µs] [cloud 11.6µs] [aggregate 52.931µs] [accumulate 310ns] [stop 250ns]
165
2024-01-10T18:40:08.264739847Z INF ETL: Allocation: QueryAllocation([2024-01-09T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster node namespace pod container]) from ETLStore[1d] 1.658131ms [query 1.348698ms] [idle/tenancy 1.51µs] [external 230ns] [aggregate 306.733µs] [accumulate 400ns] [stop 560ns]
164
2024-01-10T18:40:08.26576271Z INF ETL: Allocation: QueryAllocation([2024-01-08T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster node namespace pod controller]) from ETLStore[1d] 2.792447ms [query 1.865884ms] [idle/tenancy 820ns] [external 270ns] [aggregate 809.321µs] [accumulate 115.902µs] [stop 250ns]
163
2024-01-10T18:40:08.265991922Z INF [Profiler] 3.090089ms: Savings: abandonedWorkloads
162
2024-01-10T18:40:08.266806192Z INF ETL: Allocation: QueryAllocation([2024-01-08T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster node namespace pod container]) from ETLStore[1d] 3.738708ms [query 3.418894ms] [idle/tenancy 660ns] [external 260ns] [aggregate 318.234µs] [accumulate 410ns] [stop 250ns]
161
2024-01-10T18:40:08.26737143Z INF [Profiler] 4.343265ms: Savings: requestSizing
160
2024-01-10T18:40:08.267471681Z INF ETL: Asset: QueryAsset([2024-01-08T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster]) from ETLStore[1d] 297.653µs [query 239.423µs] [cloud 4.65µs] [aggregate 48.65µs] [accumulate 3.33µs] [stop 1.6µs]
159
2024-01-10T18:40:08.268544885Z INF [Profiler] 5.433419ms: Savings: clusterSizing
158
2024-01-10T18:40:08.273082982Z INF http: named cookie not present
157
2024-01-10T18:40:08.273221034Z INF [JWT Groups] No Cookie set
156
2024-01-10T18:40:08.27366333Z INF http: named cookie not present
155
2024-01-10T18:40:08.27369793Z INF [JWT Groups] No Cookie set
154
2024-01-10T18:40:08.27446743Z INF ETL: QuerySummaryAllocation([2024-01-03T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster]) from AggregatedStore[1d] 1.150625ms [query 308.914µs] [idle/tenancy 68.001µs] [external 440ns] [aggregate 772.32µs] [accumulate 700ns] [stop 250ns]
153
2024-01-10T18:40:08.276718678Z INF ETL: QuerySummaryAllocation([2024-01-03T18:40:08+0000, 2024-01-10T18:40:08+0000), [cluster]) from AggregatedStore[1d] 2.172538ms [query 1.683962ms] [idle/tenancy 104.601µs] [external 690ns] [aggregate 382.355µs] [accumulate 600ns] [stop 330ns]
152
2024-01-10T18:40:08.427697895Z INF Found Discount for InstanceType: t3a.xlarge of 0.00
151
2024-01-10T18:40:08.427784386Z INF [Turndown Savings] Failed to locate 'instance_type' on node pricing metric.
150
2024-01-10T18:40:08.427811097Z INF Found Discount for InstanceType:  of 0.00
149
2024-01-10T18:40:08.427837787Z INF [Turndown Savings] Failed to locate 'instance_type' on node pricing metric.
148
2024-01-10T18:40:08.427859757Z INF Found Discount for InstanceType:  of 0.00
147
2024-01-10T18:40:08.427907348Z INF Found Discount for InstanceType: t3a.large of 0.00
146
2024-01-10T18:40:08.427956138Z INF Found Discount for InstanceType: t3a.medium of 0.00
145
2024-01-10T18:40:08.428006209Z INF [Profiler] 165.524482ms: Savings: nodeTurndown
144
2024-01-10T18:41:01.739217306Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
143
2024-01-10T18:41:01.739297098Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
142
2024-01-10T18:41:01.741587906Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
141
2024-01-10T18:41:01.741670107Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
140
2024-01-10T18:42:01.770136424Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
139
2024-01-10T18:42:01.770219475Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
138
2024-01-10T18:42:01.772953Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
137
2024-01-10T18:42:01.77304115Z INF Error getting node pricing. Error: Invalid Pricing Key "us-east-1,,linux"
1
======= snip=========

Slack discussion

No response

Troubleshooting

chipzoller commented 5 months ago

Not a Helm chart issue. Transferred to the features-bugs repository per the guidelines here.

AjayTripathy commented 5 months ago

Hi @diegoduarte-gb . Kubecost single-cluster retention is based on the prometheus instance in the kubecost namespace's persistent volume and/or the backing files on the cost-model pod's persistent volume. Were either of these recreated during the upgrade process?