Closed tabossert closed 8 months ago
cc @cliffcolvin can you take a look here? Has this been fixed in the upcoming 2.1 rc's?
Transferred.
We're taking a look right now.
@tabossert do you have any further log context from this crash? About 5 lines after and 15-20 lines preceding would help me here.
`goroutine 2513173 [runnable]: runtime.cgocall(0x3225fb0, 0xc0330cb230) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0330cb208 sp=0xc0330cb1d0 pc=0x61adcb github.com/marcboeker/go-duckdb._Cfunc_duckdb_execute_pending(0x7f641502d250, 0xc073047f80) _cgo_gotypes.go:1180 +0x4b fp=0xc0330cb230 sp=0xc0330cb208 pc=0x2d51f4b github.com/marcboeker/go-duckdb.(stmt).execute.func7(0x0?, 0x80cab01?) /go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:225 +0x65 fp=0xc0330cb270 sp=0xc0330cb230 pc=0x2d5d085 github.com/marcboeker/go-duckdb.(stmt).execute(0xc05877ffb0, {0x5b5dcd8, 0xc088c8d830}, {0x80cab80?, 0x8?, 0x7f64e8088060?}) /go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:225 +0x248 fp=0xc0330cb320 sp=0xc0330cb270 pc=0x2d5cca8 github.com/marcboeker/go-duckdb.(stmt).QueryContext(0xc05877ffb0, {0x5b5dcd8?, 0xc088c8d830?}, {0x80cab80?, 0x0?, 0x176?}) /go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:175 +0x34 fp=0xc0330cb398 sp=0xc0330cb320 pc=0x2d5c994 github.com/marcboeker/go-duckdb.(conn).QueryContext(0xc059b77860, {0x5b5dcd8, 0xc088c8d830}, {0xc084129000, 0x18a}, {0x80cab80, 0x0, 0x0}) /go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/connection.go:96 +0x30a fp=0xc0330cb468 sp=0xc0330cb398 pc=0x2d53cca database/sql.ctxDriverQuery({0x5b5dcd8?, 0xc088c8d830?}, {0x7f64ec70f130?, 0xc059b77860?}, {0x0?, 0x0?}, {0xc084129000?, 0xc084129000?}, {0x80cab80, 0x0, ...}) /usr/local/go/src/database/sql/ctxutil.go:48 +0xd7 fp=0xc0330cb4f0 sp=0xc0330cb468 pc=0x16eb8d7 database/sql.(DB).queryDC.func1() /usr/local/go/src/database/sql/sql.go:1748 +0x165 fp=0xc0330cb5b0 sp=0xc0330cb4f0 pc=0x16f3b65 database/sql.withLock({0x5b44ec8, 0xc0723075f0}, 0xc0330cb708) /usr/local/go/src/database/sql/sql.go:3502 +0x82 fp=0xc0330cb5f0 sp=0xc0330cb5b0 pc=0x16fb6c2 database/sql.(DB).queryDC(0x1?, {0x5b5dcd8?, 0xc088c8d830}, {0x0, 0x0}, 0xc0723075f0, 0xc05589dd50, {0xc084129000, 0x18a}, {0x0, ...}) /usr/local/go/src/database/sql/sql.go:1743 +0x209 fp=0xc0330cb798 sp=0xc0330cb5f0 pc=0x16f34e9 database/sql.(DB).query(0x0?, {0x5b5dcd8, 0xc088c8d830}, {0xc084129000, 0x18a}, {0x0, 0x0, 0x0}, 0x80?) /usr/local/go/src/database/sql/sql.go:1726 +0xfc fp=0xc0330cb818 sp=0xc0330cb798 pc=0x16f325c database/sql.(DB).QueryContext.func1(0x80?) /usr/local/go/src/database/sql/sql.go:1704 +0x4f fp=0xc0330cb880 sp=0xc0330cb818 pc=0x16f304f database/sql.(DB).retry(0x62bdc8?, 0xc0330cb8f0) /usr/local/go/src/database/sql/sql.go:1538 +0x42 fp=0xc0330cb8c8 sp=0xc0330cb880 pc=0x16f1842 database/sql.(DB).QueryContext(0x0?, {0x5b5dcd8?, 0xc088c8d830?}, {0xc084129000?, 0x0?}, {0x0?, 0x5b5dcd8?, 0xc088c8d830?}) /usr/local/go/src/database/sql/sql.go:1703 +0xc5 fp=0xc0330cb958 sp=0xc0330cb8c8 pc=0x16f2f65 github.com/uptrace/bun.(SelectQuery).Rows(0xc084a30000, {0x5b5dcd8, 0xc088c8d830}) /go/pkg/mod/github.com/uptrace/bun@v1.1.16/query_select.go:818 +0x1a8 fp=0xc0330cba18 sp=0xc0330cb958 pc=0x2ce6668 github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.GetLabelsAnnotations({0x5b2a9a0, 0xc000d7cd40}, {0xc0502a0f80, 0x7, 0x8}, {0x80cab80, 0x0, 0x0}, {0xc055890f40, 0x1d}, ...) /app/kubecost-cost-model/pkg/duckdb/internal/db/common.go:124 +0x7db fp=0xc0330cbe20 sp=0xc0330cba18 pc=0x2d69fbb github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(AllocationDBQueryService).QueryAllocations.func1.1(0xc022a89ce0, {0xc02c037000, 0xf, 0x10}, {0x80cab80, 0x0, 0x0}, {0xc0502a0f80, 0x7, 0x8}, ...) /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1358 +0x4bd fp=0xc0330cbf58 sp=0xc0330cbe20 pc=0x2d9b13d github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(AllocationDBQueryService).QueryAllocations.func1.5() /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1380 +0x91 fp=0xc0330cbfe0 sp=0xc0330cbf58 pc=0x2d9ac31 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0330cbfe8 sp=0xc0330cbfe0 pc=0x684b81 created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(AllocationDBQueryService).QueryAllocations.func1 in goroutine 2165198 /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1338 +0x377
goroutine 2513177 [sync.Mutex.Lock]: runtime.gopark(0x2d51d7f?, 0x32253b0?, 0x98?, 0x96?, 0xc021ef9698?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc021ef9668 sp=0xc021ef9648 pc=0x651cee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.semacquire1(0xc004bda6a4, 0x0?, 0x3, 0x1, 0x60?) /usr/local/go/src/runtime/sema.go:160 +0x218 fp=0xc021ef96d0 sp=0xc021ef9668 pc=0x6631b8 sync.runtime_SemacquireMutex(0xc021ef9748?, 0xd1?, 0x18?) /usr/local/go/src/runtime/sema.go:77 +0x25 fp=0xc021ef9708 sp=0xc021ef96d0 pc=0x680b25 sync.(Mutex).lockSlow(0xc004bda6a0) /usr/local/go/src/sync/mutex.go:171 +0x15d fp=0xc021ef9758 sp=0xc021ef9708 pc=0x68fd1d sync.(Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:90 database/sql.(*driverConn).finalClose(0xc0512c0ab0) /usr/local/go/src/database/sql/sql.go:648 +0x133 fp=0xc021ef9800 sp=0xc021ef9758 pc=0x16ed4d3 database/sql.finalCloser.finalClose-fm()
@tabossert That's helpful, thank you for the quick response. I'm looking for the first instance of the goroutine ...
string in the logs, 15-20 lines preceding that, and the stack trace attached to that specific goroutine. In Go, the stack trace for every goroutine is printed on a panic like this, but the offending goroutine's trace is printed first which is why I'm asking for that, plus the log context that lead us to that trace.
If you'd like, I can make it easier on you -- you can share the log file with me privately via email: michael@kubecost.com
To clarify: I need more log context to understand what's going wrong here. Please either share a full log file or share the requested first trace + surrounding context I mentioned above.
Email sent with full log @michaelmdresser
Thank you @tabossert. I have a pretty strong theory about what's going wrong here -- there are a few different resolution paths if this is what I think it is.
If you are willing to try a pre-production release, please upgrade to Kubecost v2.1.0-rc.6
or v2.1.0
when it is released, which is imminent. I am fairly certain that you are experiencing an issue which has been fixed in v2.1.
Otherwise, if you would like to stay on v2.0.2
:
forecasting.enabled=false
/model/allocation
, please set the query parameter includeAggregatedMetadata=false
. Also, if these queries have no aggregate
parameter (or a high-cardinality one like aggregate=pod
), I recommend using the limit
and offset
query parameters to paginate the response, e.g. limit=100&offset=0
-> limit=100&offset=100
-> limit=100&offset=200
. Thanks, we will try those workarounds until the v2.1.0 is released. Thanks for the quick response!
I tried upgrading to 2.1.0-rc6, but it wasn't seeming to load the data, so not sure if I missed something, I went to go back to 2.0.2 but now it gives me this error `│ 2024-02-22T01:16:58.356391917Z ERR error doing initial open of DB: error opening db at path /var/configs/waterfowl/duckdb/v0_9_2/kubecost.duckdb.write: migrating up: no migration found for version 20240212233831: read down for version 20240212233831 migrations: file does not exist │ │ panic: runtime error: invalid memory address or nil pointer dereference │ │ [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x16ee895] │ │ │ │ goroutine 27 [running]: │ │ database/sql.(DB).Close(0x0) │ │ /usr/local/go/src/database/sql/sql.go:877 +0x35 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.startIngestor(0xc0009ccba0, 0xc000f0f4b0?) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:234 +0x28 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func5({0x47a00a0?, 0xc0001feba0?}, 0x1?) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:125 +0x1b │ │ github.com/looplab/fsm.(FSM).enterStateCallbacks(0xc000f12000, {0x5b5dd10, 0xc0000da5f0}, 0xc0001feba0?) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82 │ │ github.com/looplab/fsm.(FSM).Event.(FSM).Event.func2.func3() │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150 │ │ github.com/looplab/fsm.transitionerStruct.transition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 │ │ github.com/looplab/fsm.(FSM).doTransition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 │ │ github.com/looplab/fsm.(FSM).Event(0xc000f12000, {0x5b5d8e8, 0x80cab80}, {0x4e18562, 0xd}, {0x0, 0x0, 0x0}) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x884 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter(0xc0012b70a0, {0xc00128de40, 0x3a}, {0xc00128df40, 0x39}) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:180 +0x6ef │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.createWriter(0xc0012b7040) │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:398 +0x33 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func7({0x47a00a0?, 0xc0013ba3f0?}, 0xc000f08000) │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:212 +0x25 │ │ github.com/looplab/fsm.(FSM).enterStateCallbacks(0xc0013bc500, {0x5b5dd10, 0xc0000da500}, 0xc0013ba3f0?) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82 │ │ github.com/looplab/fsm.(FSM).Event.(FSM).Event.func2.func3() │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150 │ │ github.com/looplab/fsm.transitionerStruct.transition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 │ │ github.com/looplab/fsm.(FSM).doTransition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 │ │ github.com/looplab/fsm.(*FSM).Event(0xc0013bc500, {0x5b5d8e8, 0x80cab80}, {0x4e4a44e, 0x1b}, {0x0, 0x0, 0x0}) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x884 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6.1() │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:204 +0x3e │ │ created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6 in goroutine 1 │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:203 +0x505`
Actually just did an upgrade to 2.1.0 that was just released and that seems to be loading, will report back if the crashes stop
Thanks for the update and sorry for the confusion about the back-and-forth upgrade. Please let us know if you run into trouble with 2.1.0.
Issue seems to be resolved, thanks!
Hello everyone! @michaelmdresser I experience the similar issue on GKE cluster in version 2.2.2
so it seems to be back.
It works and suddenly it stopped working.
Here's the full go trace:
INF Starting Kubecost Aggregator version kcm-c630c42588_core-c3cb2218df_oc-088f891d8e (c630c425)
INF NAMESPACE: kubecost
ERR error doing initial open of DB: error opening db at path /var/configs/waterfowl/duckdb/v0_9_2/kubecost.duckdb.write: setting up migrations: opening '/var/configs/waterfowl/d
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17774f5]
goroutine 22 [running]:
database/sql.(*DB).Close(0x0)
/usr/local/go/src/database/sql/sql.go:910 +0x35
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.startIngestor(0xc001f93d40, 0xc000afe060)
/app/kubecost-cost-model/pkg/duckdb/write/writer.go:342 +0x28
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func5({0x461e2c0?, 0xc0014a0a20?}, 0xc0016d1208?)
/app/kubecost-cost-model/pkg/duckdb/write/writer.go:188 +0x1b
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc0014a7c00, {0x63c1568, 0xc003e16190}, 0xc000be00e0)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150
github.com/looplab/fsm.transitionerStruct.transition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422
github.com/looplab/fsm.(*FSM).doTransition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407
github.com/looplab/fsm.(*FSM).Event(0xc0014a7c00, {0x63c10f8, 0x8763380}, {0x4c3e1c7, 0x15}, {0x0, 0x0, 0x0})
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func7({0x461e2c0?, 0xc0014a0a20?}, 0xc000be0070)
/app/kubecost-cost-model/pkg/duckdb/write/writer.go:202 +0x11a
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc0014a7c00, {0x63c1568, 0xc003e160a0}, 0xc000be0070)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150
github.com/looplab/fsm.transitionerStruct.transition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422
github.com/looplab/fsm.(*FSM).doTransition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407
github.com/looplab/fsm.(*FSM).Event(0xc0014a7c00, {0x63c10f8, 0x8763380}, {0x4c22c8b, 0xd}, {0x0, 0x0, 0x0})
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter(0xc000afe060, {0xc003aa81c0, 0x3a}, {0xc003aa82c0, 0x39})
/app/kubecost-cost-model/pkg/duckdb/write/writer.go:258 +0x7be
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.createWriter(0xc000afe000)
/app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:400 +0x33
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func7({0x461e2c0?, 0xc003a8c510?}, 0xc000adf420)
/app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:213 +0x25
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc003a98d80, {0x63c1568, 0xc0016c8050}, 0xc000adf420)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150
github.com/looplab/fsm.transitionerStruct.transition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422
github.com/looplab/fsm.(*FSM).doTransition(...)
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407
github.com/looplab/fsm.(*FSM).Event(0xc003a98d80, {0x63c10f8, 0x8763380}, {0x4c563b8, 0x1b}, {0x0, 0x0, 0x0})
/root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6.1()
/app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:205 +0x3e
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6 in goroutine 1
/app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:204 +0x4e8
I have resolved the issue above with according to this message: https://github.com/kubecost/features-bugs/issues/72
Kubecost Helm Chart Version
2.0.2
Kubernetes Version
1.27
Kubernetes Platform
AKS
Description
Intermittently the kubecost pod restarts, due to an error in the aggregator pod as seen below
We have tuned resources as much as possible so it doesn't seem to be related to OOM or disk slowness.
Steps to reproduce
Expected behavior
Pod would not be restarting
Impact
Our scripts to pull data out fail when this happens
Screenshots
No response
Logs
Slack discussion
No response
Troubleshooting