tikv / client-go

Go client for TiKV
Apache License 2.0
271 stars 212 forks source link

performance of searchCachedRegion could be enhanced #532

Open chrysan opened 2 years ago

chrysan commented 2 years ago

When many regions are cached in memory, searchCachedRegion becomes slower and holds RegionCache global read lock for longer time, and then makes other queries who load new regions wait for write lock. When QPS grows, the mutex contention becomes even worse and query latency grows.

image

findRegionByKey waits for write lock:

goroutine 8144205746 [semacquire]:goroutine 8144205746 [semacquire]:sync.runtime_SemacquireMutex(0xc0003ae014, 0x0, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47sync.(*Mutex).lockSlow(0xc0003ae010) /usr/local/go/src/sync/mutex.go:138 +0xfcsync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81sync.(*RWMutex).Lock(0xc0003ae010) /usr/local/go/src/sync/rwmutex.go:98 +0x97github.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey(0xc0003ae000, 0xc4040d51a8, 0xc0eb43fce0, 0x13, 0x13, 0xc031612e00, 0x7fcaad8be1f0, 0x0, 0x40) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:582 +0x6c2

searchCachedRegion holds read lock:

goroutine 8056902450 [runnable]:goroutine 8056902450 [runnable]:github.com/pingcap/tidb/store/tikv.(*RegionCache).searchCachedRegion.func1(0x3886ec0, 0xc263c1d180, 0xc1864d85c0) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:914 +0x173github.com/google/btree.(*node).iterate(0xc323d37c80, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:557 +0x1cdgithub.com/google/btree.(*node).iterate(0xc225ed4980, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc34e0e7640, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x101, 0xc0322f0088, 0xc44df40101) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc0d1673e40, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0xc0322f0101, 0xc0322f0088, 0x20) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:549 +0x115github.com/google/btree.(*node).iterate(0xc3d65de240, 0xffffffffffffffff, 0x3886ec0, 0xc1864d85c0, 0x0, 0x0, 0x1, 0xc0322f0088, 0x32) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:549 +0x115github.com/google/btree.(*BTree).DescendLessOrEqual(...) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:795github.com/pingcap/tidb/store/tikv.(*RegionCache).searchCachedRegion(0xc0003ae000, 0xc44df41aa0, 0x1c, 0x30, 0x11f8800, 0xb) /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:914 +0x2aegithub.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey(0xc0003ae000, 0xc0322f07a8, 0xc44df41aa0, 0x1c, 0x30, 0x11bf700, 0xc0322f0318, 0x11f483c, 0x0)
chrysan commented 2 years ago

BTW, memory usage of region cache could be tracked in case of risk of oom.

chrysan commented 2 years ago

Another finding is, cached regions are much more than real live regions: image image

This use case has many "truncate table". The eviction of cached regions could be enhanced.

disksing commented 1 year ago

We may consider using skiplist as a replacement. Compared to btree, skiplist can have a smaller granularity of locks.