tikv / pd

Placement driver for TiKV
Apache License 2.0
1.06k stars 719 forks source link

introduce callerID for grpc and apply it for GetRegions #8593

Open okJiang opened 2 months ago

okJiang commented 2 months ago

Enhancement Task

GetRegion(s) is a hot path and an accident-prone area in PD. In order to better troubleshoot GetRegion(s) in the event of an incident, we decided to record callerID information for it to make it easier to know who was frantically calling GetRegion(s) in the meantime.

okJiang commented 1 month ago

I would like to put the CallerID and CallerComponent information into the pd header, e.g. https://github.com/okJiang/kvproto/commit/9835bbc4473a0698f62fff971cad6ba43274ce85. CallerID/CallerComponent can be passed in via ClientOption when the client is created, and should also be allowed to be passed in temporarily each time the interface is called. The priority of ClientOption should be GetRegion(xxx, xxx, ...ClientOption) > CreateClient(xxx, xxx, ...ClientOption).

There is a problem, in the client side, there are very many kinds of options, such as GetStoreOp, RegionsOp, GetRegionOp, due to the client interface in a function can only have one kind of option exists, I tend to unify these Option as ClientOption, all following the rules in the previous paragraph.

What do you think? @rleungx @JmPotato @lhy1024 @nolouch

JmPotato commented 1 month ago

I would like to put the CallerID and CallerComponent information into the pd header, e.g. okJiang/kvproto@9835bbc. CallerID/CallerComponent can be passed in via ClientOption when the client is created, and should also be allowed to be passed in temporarily each time the interface is called. The priority of ClientOption should be GetRegion(xxx, xxx, ...ClientOption) > CreateClient(xxx, xxx, ...ClientOption).

There is a problem, in the client side, there are very many kinds of options, such as GetStoreOp, RegionsOp, GetRegionOp, due to the client interface in a function can only have one kind of option exists, I tend to unify these Option as ClientOption, all following the rules in the previous paragraph.

What do you think? @rleungx @JmPotato @lhy1024 @nolouch

What do you think about this approach to chain calling like the HTTP client? Instead of modifying the function signature, we can return a new client with the CallerID set. We also need to ensure that multiple references to the same client work correctly.

https://github.com/pingcap/tidb/blob/26443dab89ba5b1a866de3c3e44e2f51468bc435/pkg/store/helper/helper.go#L108

okJiang commented 1 month ago

@JmPotato I think that's a good way to solve this problem as well 👍

okJiang commented 3 weeks ago

After sorting

Since client-go(RegionCache) is a separate repository, the caller need to additionally identify themselves when they create the RegionCache so that the RegionCache knows who it is when it calls pd-client.