pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.99k stars 5.82k forks source link

Chunk size too big cause frequent context switch #28339

Open tiancaiamao opened 3 years ago

tiancaiamao commented 3 years ago

Enhancement

Found in our oncall 3717

The CPU usage is below 40%, and the context switch is frequent. As you can see from this picture, mcall-> schedule -> findrunnable means Go runtime can't full utilize the CPU:

image

And it's caused by allocating too much sized Chunk

image

Go runtime scheduler have the concept of M G P, and a goroutine first try to alloc memory from its local M (mheap), if the local memory is used up, it will try to alloc from a global (mcentral) ... and this operation involves a lock.

You can see this picture the code goes to yield, and that's the root cause of the high frequent context switch:

image

So we shouldn't allocating too much 2~4K objects, that will use up the local M quickly and result in global allocation, and then lock.

Attached is the profile, you can verify that ... but I'm not sure how to reproduce it because it's grab from our customer's workload.

go tool pprof -http=:6060 profile
go tool pprof -http=:6061 heap

tidb_29_used_prepared_statement (1).zip

tiancaiamao commented 3 years ago

@Defined2014

tiancaiamao commented 3 years ago

Go runtime use 8K page to manage the allocation. So what we should do is find out the 2k~4k sized chunk, try to avoid it. We can whether allocation big memory and manage it ourselves, or use small allocation.