kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

[Bug] Aggregator panics on QueryAssetCTE #107

Closed Hexta closed 1 month ago

Hexta commented 4 months ago

Kubecost Version

2.3.2

Kubernetes Version

1.28.9

Kubernetes Platform

EKS

Description

Aggregator panics with the following stack after random time during Repair ETL Allocation (Daily):

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x29bdfe0]

goroutine 818362 [running]:
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset/db.(*AssetDBQueryService).QueryAssetCTE(0x4001a041f8?, {0x406462acf0?, 0x406462ad08?}, {0x4007c7d5a0?, 0x2?, 0x2?}, 0x4022edf3f0?, {0x692be28?, 0x40646a5170?})
    /app/kubecost-cost-model/pkg/duckdb/asset/db/assetquery.go:846 +0xb0
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset/db.(*AssetDBQueryService).QueryAsset(0x4001a041f8, {0x406462acf0?, 0x406462ad08?}, {0x4007c7d5a0, 0x2, 0x2}, 0x4022edf3f0, {0x692be28, 0x40646a5140})
    /app/kubecost-cost-model/pkg/duckdb/asset/db/assetquery.go:111 +0x414
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset.(*DuckDBAssetQueryService).QueryAsset.func1(0x1, {0x405cdd9ec0, 0x18, 0x0?}, {0x406462acf0?, 0x406462ad08?})
    /app/kubecost-cost-model/pkg/duckdb/asset/assetqueryservice.go:133 +0x35c
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset.(*DuckDBAssetQueryService).QueryAsset in goroutine 818387
    /app/kubecost-cost-model/pkg/duckdb/asset/assetqueryservice.go:146 +0x570

Steps to reproduce

  1. Run Repair ETL Allocation (Daily)

Expected behavior

Aggregator doesn't panic.

Impact

No response

Screenshots

No response

Logs

No response

Slack discussion

No response

Troubleshooting

Hexta commented 4 months ago

With debug logs

2024-07-04T09:50:00.389888655Z DBG No mapping found for aggregate: service
2024-07-04T09:50:00.433023521Z DBG added metric unclaimedVolumes%%High-Availability
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x29bdfe0]

goroutine 177361 [running]:
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset/db.(*AssetDBQueryService).QueryAssetCTE(0x4000c4af00?, {0x401c2d2108?, 0x401c2d2120?}, {0x401d02c180?, 0x2?, 0x2?}, 0x40159eb600?, {0x692be28?, 0x4033c28240?})
    /app/kubecost-cost-model/pkg/duckdb/asset/db/assetquery.go:846 +0xb0
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset/db.(*AssetDBQueryService).QueryAsset(0x4000c4af00, {0x401c2d2108?, 0x401c2d2120?}, {0x401d02c180, 0x2, 0x2}, 0x40159eb600, {0x692be28, 0x4033c281b0})
    /app/kubecost-cost-model/pkg/duckdb/asset/db/assetquery.go:111 +0x414
github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset.(*DuckDBAssetQueryService).QueryAsset.func1(0x1, {0x402286e0c0, 0x18, 0x692be60?}, {0x401c2d2108?, 0x401c2d2120?})
    /app/kubecost-cost-model/pkg/duckdb/asset/assetqueryservice.go:133 +0x35c
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/asset.(*DuckDBAssetQueryService).QueryAsset in goroutine 175911
    /app/kubecost-cost-model/pkg/duckdb/asset/assetqueryservice.go:146 +0x570
AjayTripathy commented 4 months ago

Hi @Hexta we're looking into this now.

cliffcolvin commented 4 months ago

Hello @Hexta we have this resolved and will be released in 2.3.3 patch. Also if you're needing it quickly and willing to take an RC 2.3.3-rc.0 is currently available.

arazdolski commented 4 months ago

Hi @cliffcolvin Thanks for the quick fix 🙏

We will wait until the v2.3.3 release since we use Helm chart from ECR

Hexta commented 4 months ago

We've updated kubecost to v2.3.3, and it still crashes with another stack.

There is an example of the logs. Unfortunately, line order is not preserved.

Logs ```plain DBG RO DB path: /var/configs/waterfowl/duckdb/v0_10_3/kubecost-1721699893.duckdb.read fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) /usr/local/go/src/runtime/mbitmap.go:286 +0x128 fp=0xffff1a1cc640 sp=0xffff1a1cc5f0 pc=0x44c798 /usr/local/go/src/runtime/mbitmap.go:329 +0xac fp=0xffff1a1cc680 sp=0xffff1a1cc640 pc=0x44c8fc runtime.wbBufFlush1(0x400008a008) runtime.wbBufFlush.func1() runtime.systemstack(0x0) /usr/local/go/src/runtime/asm_arm64.s:243 +0x6c fp=0xffff1a1cc710 sp=0xffff1a1cc700 pc=0x4ac81c goroutine 8863972 gp=0x41372281c0 m=15 mp=0x4000ea3008 [running]: runtime stack: runtime.throw({0x4abaf95?, 0x1c71c4000?}) /usr/local/go/src/runtime/panic.go:1023 +0x40 fp=0xffff1a1cc5f0 sp=0xffff1a1cc5c0 pc=0x472890 runtime.badPointer(0xfffe515ff408, 0x42694d2030, 0x0, 0x0) runtime.findObject(0xffff1a1cc6c0?, 0x2d1ed4c?, 0x2d249dc?) /usr/local/go/src/runtime/mwbbuf.go:240 +0xec fp=0xffff1a1cc6e0 sp=0xffff1a1cc680 pc=0x46dddc /usr/local/go/src/runtime/mwbbuf.go:181 +0x24 fp=0xffff1a1cc700 sp=0xffff1a1cc6e0 pc=0x4a5654 runtime.systemstack_switch() runtime.wbBufFlush() gcWriteBarrier() /usr/local/go/src/runtime/asm_arm64.s:1294 +0x74 fp=0x41f0584fd0 sp=0x41f0584ef0 pc=0x4ac044 runtime.(*_panic).start(0x38?, 0xfffe51758388?, 0x10?) /usr/local/go/src/runtime/panic.go:794 +0x38 fp=0x41f0584ff0 sp=0x41f0584fd0 pc=0x472038 github.com/marcboeker/go-duckdb.scan(0xffff0408f460, 0x4a) github.com/marcboeker/go-duckdb.scanValue(0x41f05852e8?, 0x2?) /app/go-duckdb/rows.go:82 +0x1c fp=0x41f0585280 sp=0x41f0585260 pc=0x298915c database/sql.(*Rows).nextLocked(0x4002421560) /usr/local/go/src/database/sql/sql.go:3047 +0x14c fp=0x41f0585320 sp=0x41f05852c0 pc=0x14aed8c database/sql.(*Rows).Next.func1() /usr/local/go/src/database/sql/sql.go:3022 +0x30 fp=0x41f0585350 sp=0x41f0585320 pc=0x14aec10 /usr/local/go/src/database/sql/sql.go:3530 +0x7c fp=0x41f0585390 sp=0x41f0585350 pc=0x14b156c database/sql.(*Rows).Next(0x4002421560) github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.GetConfig({0x692c9f0?, 0x8b821e0?}, 0x41f0585528?) /app/kubecost-cost-model/pkg/duckdb/internal/db/info.go:127 +0x134 fp=0x41f0585500 sp=0x41f05853e0 pc=0x29a7ab4 github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.LogDBMemoryInfo({0x692c9f0, 0x8b821e0}, 0x4200343520) /app/kubecost-cost-model/pkg/duckdb/internal/db/db.go:114 +0x174 fp=0x41f0585620 sp=0x41f05855b0 pc=0x29a5de4 /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x7c fp=0x41f0585840 sp=0x41f0585800 pc=0x2c5dffc /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x148 fp=0x41f05858c0 sp=0x41f0585840 pc=0x2c5d828 github.com/looplab/fsm.transitionerStruct.transition(...) github.com/looplab/fsm.(*FSM).doTransition(...) github.com/looplab/fsm.(*FSM).Event(0x4001a5cf00, {0x692c9f0, 0x8b821e0}, {0x49e1513, 0xf}, {0x0, 0x0, 0x0}) github.com/looplab/fsm.(*FSM).enterStateCallbacks(0x4001a5cf00, {0x692cea0, 0x40356f7a40}, 0x421ccdc9a0) /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 :1 +0x48 fp=0x41f0585cc0 sp=0x41f0585cb0 pc=0x2c5f4e8 github.com/looplab/fsm.(*FSM).doTransition(...) github.com/looplab/fsm.(*FSM).Event(0x4001a5cf00, {0x692c9f0, 0x8b821e0}, {0x49e1522, 0xf}, {0x0, 0x0, 0x0}) github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.startTimer.func1() ```
cliffcolvin commented 4 months ago

@Hexta I'll get someone on my team to take a look at this again. Thank you for the additional information here.

AjayTripathy commented 3 months ago

We're investigating this now @Hexta

chipzoller commented 1 month ago

Hello, in an effort to consolidate our bug and feature request tracking, we are deprecating using GitHub to track tickets. If this issue is still outstanding and you have not done so already, please raise a request at https://support.kubecost.com/.