Closed DerekTBrown closed 1 month ago
This looks similar to the following issues, but those are fixed:
Hi @DerekTBrown we're working on a smoother way to handle this, but for now you should be able to delete and recreate the persistent volume and restart the pod to get out of this state.
Hi @DerekTBrown we're working on a smoother way to handle this, but for now you should be able to delete and recreate the persistent volume and restart the pod to get out of this state.
Done, and that did seem to resolve the issue.
Is the plan to just have something that deletes the cache if it becomes corrupted?
@cliffcolvin we're pretty sure this is getting addressed in 2.3+ right?
I had the same issue, but it started working after I recreated the aggregator-db
persistent volume.
Confirmed same issue with v2.2.4.
Hey there, this should be resolved in our 2.3 release. We're planning on releasing 2.3.2 sometime today or tomorrow and recommend upgrading to that when it's ready!
Not sure if this is the same exact issue, but we've hit something similar with 2.3.5:
panic: failed to create ingestor: Ingestor: error creating db: setting up migrations: opening '/var/configs/waterfowl/duckdb/v0_10_3/kubecost.duckdb.write': database/sql/driver: could not open database: duckdb error: IO Error: Corrupt database file: computed checksum 4178360413824115490 does not match stored checksum 16005271743778032503 in block at location 34877440
Could you try and recreate the aggregator db volume and let me know if that works @timchenko-a
Bit different message, but recreating the aggregator db PVC solves the issue (added some logs before the error for context):
2024-10-04T12:07:32.611615117Z INF Copy starting
2024-10-04T12:07:32.622755633Z ERR Failed to get migrate version: no migration
2024-10-04T12:08:37.152210689Z INF Copy finished
2024-10-04T12:08:37.381023796Z INF Ingestion starting
2024-10-04T12:08:37.38826687Z INF Using default file store as data source
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3099b71]
goroutine 2052 [running]:
github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/bun.(*Writer).InsertCloudCosts(0xc0000b4790, {0x6690990, 0xc004a41e90}, 0xc0040ad110, 0xc001064800)
/app/kubecost-cost-model/pkg/duckdb/internal/bun/writer.go:704 +0x1d1
github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/cloudcost.(*Ingestor).run.func1(0xc0040ad110)
/app/kubecost-cost-model/pkg/duckdb/internal/cloudcost/investor.go:201 +0x10d6
github.com/opencost/opencost/core/pkg/util/worker.(*queuedWorkerPool[...]).worker(0x0)
/app/opencost/core/pkg/util/worker/worker.go:117 +0x42
created by github.com/opencost/opencost/core/pkg/util/worker.NewWorkerPool[...] in goroutine 1648
/app/opencost/core/pkg/util/worker/worker.go:72 +0x13d
It's not ideal to delete the PVC every time duckdb crashes, though I have no idea why it crashes in the first place.
Using v2.3.4.
@igorbrites can you try the latest version and let me know if this persists?
Hello, in an effort to consolidate our bug and feature request tracking, we are deprecating using GitHub to track tickets. If this issue is still outstanding and you have not done so already, please raise a request at https://support.kubecost.com/.
Kubecost Version
2.2.5
Kubernetes Version
1.25
Kubernetes Platform
Other (specify in description)
Description
After a 2.2.5 upgrade, I see the
aggregator
container failing to start with the following message:Steps to reproduce
kubecost
and wait.Expected behavior
Impact
No response
Screenshots
No response
Logs
No response
Slack discussion
No response
Troubleshooting