pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.26k stars 5.84k forks source link

global sort; pool limiter stuck when import on a 32c64g node #51734

Closed D3Hunter closed 2 months ago

D3Hunter commented 8 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

run import with global sort and 32 thread on 32c64g node, on ingest step, some subtask stuck at pool limiter stack: stuck-stack.log

2. What did you expect to see? (Required)

import success or fail

3. What did you see instead (Required)

stuck

4. What is your TiDB version? (Required)

master

lance6716 commented 6 months ago

hopefully it's closed with the same reason as https://github.com/pingcap/tidb/issues/52884

D3Hunter commented 2 months ago

met again on current master branch, see https://github.com/pingcap/tidb/issues/55374 too

goroutine 268459959 [chan receive, 793 minutes]:
github.com/pingcap/tidb/br/pkg/membuf.(*Limiter).Acquire(0xc01fbf19f0, 0x100000)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/limiter.go:56 +0x1ab
github.com/pingcap/tidb/br/pkg/membuf.(*Pool).acquire(0xc021209200)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:100 +0x28
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).addBlock(0xc0194121e0)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:302 +0x8b
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).allocBytesWithSliceLocation(0xc0194121e0, 0x19426)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:272 +0x65
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).AllocBytes(0xc0194121e0, 0xc04aca9130?)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:245 +0x29
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).AddBytes(...)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:317
github.com/pingcap/tidb/pkg/lightning/backend/external.readOneFile({0x6ce5700, 0xc04aca9130}, {0x6d08a50?, 0xc13346a420?}, {0xc075094b00, 0x38}, {0xc219c8cd68, 0x13, 0x18}, {0xc219c8cd80, ...}, ...)
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/lightning/backend/external/reader.go:186 +0x570
github.com/pingcap/tidb/pkg/lightning/backend/external.readAllData.func2()
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/lightning/backend/external/reader.go:98 +0x3b8
github.com/pingcap/tidb/pkg/lightning/backend/external.readAllData.(*ErrorGroupWithRecover).Go.func3()
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/wait_group_wrapper.go:250 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 268311633
    /go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96
lance6716 commented 2 months ago

For the hotspot files, the data will occupy a lot of memories, which will exceed the memLimiter threshold (12G)

[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/515082f8-c85a-48a2-90b3-aa7536db2d78_stat/1] [startOffset=85385025] [endOffset=702434139] [expectedConc=74] [concurrency=74]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/99287c3d-91f0-4535-8e62-93bac8286d79_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0109bbcc-08c5-4310-b601-b63649ccddf6_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/a5b9f1ec-c2b0-490c-8a42-82e53dca3265_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/5295178c-09d1-47a6-bd9c-85336a7bfd38_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/6c3c2c39-3e07-4768-8a19-60dfebd49a39_stat/0] [startOffset=617049114] [endOffset=736588149] [expectedConc=15] [concurrency=15]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/6c3c2c39-3e07-4768-8a19-60dfebd49a39_stat/1] [startOffset=0] [endOffset=496371612] [expectedConc=60] [concurrency=60]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e195166c-c860-4790-b240-d6ed5cdcb9f0_stat/1] [startOffset=85385025] [endOffset=736588149] [expectedConc=78] [concurrency=78]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e195166c-c860-4790-b240-d6ed5cdcb9f0_stat/2] [startOffset=0] [endOffset=170770050] [expectedConc=21] [concurrency=21]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/d5bdedc6-1616-42fd-882c-066c61a590c3_stat/1] [startOffset=85385025] [endOffset=702434139] [expectedConc=74] [concurrency=74]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e8df2a22-26c2-4be9-884f-4b1a9b1d506c_stat/0] [startOffset=658033926] [endOffset=736588149] [expectedConc=10] [concurrency=10]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e8df2a22-26c2-4be9-884f-4b1a9b1d506c_stat/1] [startOffset=0] [endOffset=496371612] [expectedConc=60] [concurrency=60]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/3d5e396d-21ec-4848-9395-7d3336731afa_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/cb7032c9-762f-4761-96a8-d8cd3fc7609e_stat/1] [startOffset=85385025] [endOffset=508894749] [expectedConc=51] [concurrency=51]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/169d1aff-9061-4dbb-ad82-8cfff2f86d72_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/8ab7b236-e767-4d0c-8c38-0c38bf24ffcb_stat/0] [startOffset=617049114] [endOffset=736588149] [expectedConc=15] [concurrency=15]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/8ab7b236-e767-4d0c-8c38-0c38bf24ffcb_stat/1] [startOffset=0] [endOffset=291447552] [expectedConc=35] [concurrency=35]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/b2b78ac1-2b12-416d-8692-e86e84465cc7_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/ce3779a2-33de-4ff1-856a-72569111a18d_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0c7db861-6164-4bb1-badd-660805e7256a_stat/1] [startOffset=85385025] [endOffset=736588149] [expectedConc=78] [concurrency=78]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0c7db861-6164-4bb1-badd-660805e7256a_stat/2] [startOffset=0] [endOffset=170770050] [expectedConc=21] [concurrency=21]