Closed AndreMouche closed 2 days ago
cc @rleungx @nolouch
This is duplicated with https://github.com/tikv/pd/issues/6556 , I will close it
I think on one hand we need to limit the concurrent request access on the PD side, and on the other hand, we also need to add some metrics to distinguish request sources.
For example: Since GetMember(Leader) is an open API and PD(ETCD) do not record these sources, it can only be said that TiDB,TiKV,BR,CDC may access it.
I believe there are not enough metrics emitted to investigate such scenarios:
- There is no connection attempts metrics from TiDB
- There is no connection open/closed metric tracked from PD side.
It is a complex historical legacy issue. In TiDB, some palace directly use pd-client to connect to PD, and we can see the corresponding monitoring information on TiDB->PD-Client.
However, in other places, for example the ddl part, etcd-client is used to connect to ETCD in PD directly, and currently we do not have the metrics for this part.
https://github.com/tikv/pd/issues/4480 It seems that this issue involves some API rate limiting, but I couldn't find any official documentation on how to use this feature. @CabinfeverB could you provide some information?
, and we can see the corresponding monitoring information on TiDB->PD-Client
@AndreMouche TiDB->PD-Client
does not have any connection related metrics like TiKV->PD
does.
@Tema For metrics TiKV->PD, it was on TiKV-details->PD-client
related issue https://github.com/tikv/pd/issues/5739
After finishing #6834, we are exploring adaptive current-limiting mechanisms
tracked by #5739
Enhancement Task
I think we need to limit the concurrent request access on the PD side, For example, the request to “get-regions” for obtaining all region metadata in full. If there are a large number of regions in the system, the memory occupied by one request may be as high as several GBs. Without controlling its request quantity, PD is very likely to encounter OOM errors.