Open Yui-Song opened 8 months ago
/type performance /type regression /sig execution /severity critical
/remove-label may-affects-7.5 /remove-label may-affects-7.1 /remove-label may-affects-6.5 /remove-label may-affects-6.1 /remove-label may-affects-5.4
Comparing the CPU profiles before and after, there's a significant difference in the SendReqCtx section, especially with a noticeable increase in the proportion of CPU overhead attributed to sync.Map.Load().
This seems to be a client-go related problem, I'll change the sig label from sig/execution to sig/transaction
The overhead seems to be caused by some kinds of runtime events (eg. some assist works). It can happend in any where (not just SendReqCtx -> updateTiKVSendReqHistogram -> runtime.newstack
), and we cannot reproduce the issue (high overhead of updateTiKVSendReqHistogram
). Thus the root cause might not be related to SendReqCtx
directly, further investigation is required.
https://github.com/pingcap/tidb/pull/50650 caused a regression in compile duration which resulted in a 1.5% QPS regression of taobench
https://github.com/pingcap/tidb/pull/49900 caused a 2% QPS regression of taobench
The overhead seems to be caused by some kinds of runtime events (eg. some assist works). It can happend in any where (not just
SendReqCtx -> updateTiKVSendReqHistogram -> runtime.newstack
), and we cannot reproduce the issue (high overhead ofupdateTiKVSendReqHistogram
). Thus the root cause might not be related toSendReqCtx
directly, further investigation is required.
Once I write a blog about how to handle this kind of issue, but the domain service provider is down, so zenlife.tk
is not available any more. Here are some links maybe related.
https://github.com/tiancaiamao/gp
http://107.173.155.134:8080/goroutine-pool.md
The overhead seems to be caused by some kinds of runtime events (eg. some assist works). It can happend in any where (not just
SendReqCtx -> updateTiKVSendReqHistogram -> runtime.newstack
), and we cannot reproduce the issue (high overhead ofupdateTiKVSendReqHistogram
). Thus the root cause might not be related toSendReqCtx
directly, further investigation is required.Once I write a blog about how to handle this kind of issue, but the domain service provider is down, so
zenlife.tk
is not available any more. Here are some links maybe related. https://github.com/tiancaiamao/gp http://107.173.155.134:8080/goroutine-pool.md
@you06 is working to improve it by the global goroutine pool.
/unassign @bb7133 /assign @you06
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
No performance regression
3. What did you see instead (Required)
Update on 2024-07-09: Workloads like select_random_ranges/select_random_points involving many coprocessor operations are also affected by the overhead caused by TiDB runtime events mentioned below.
The QPS of taobench: baseline: v7.5.0, QPS = 27240
2023-12-10, QPS= 26931, commit https://github.com/pingcap/tidb/commit/899dfe8a7417a545a0c049c7d77876c8eaee5667, regression = 1.1%
2024-01-23, QPS= 25775, commit https://github.com/pingcap/tidb/commit/67fb41548da63491e324c09d57c53bb48a247d0d, regression=5.4%
2024-03-14, QPS= 24641, commit https://github.com/pingcap/tidb/commit/f8ac982ebf06198657f8943575fee0995890c390, regression=9.5%
2024-03-15, QPS= 24141, commit https://github.com/pingcap/tidb/commit/68c03cfb656436ab4082134872b8cf9afb86cd78, regression 11%
4. What is your TiDB version? (Required)