tikv / pd

Placement driver for TiKV
Apache License 2.0
1.05k stars 719 forks source link

grpc/api: should we limit the concurrency request access to PD/ETCD in PD #6604

Closed AndreMouche closed 2 days ago

AndreMouche commented 1 year ago

Enhancement Task

I think we need to limit the concurrent request access on the PD side, For example, the request to “get-regions” for obtaining all region metadata in full. If there are a large number of regions in the system, the memory occupied by one request may be as high as several GBs. Without controlling its request quantity, PD is very likely to encounter OOM errors.

AndreMouche commented 1 year ago

cc @rleungx @nolouch

AndreMouche commented 1 year ago

This is duplicated with https://github.com/tikv/pd/issues/6556 , I will close it

AndreMouche commented 1 year ago

I think on one hand we need to limit the concurrent request access on the PD side, and on the other hand, we also need to add some metrics to distinguish request sources.

For example: Since GetMember(Leader) is an open API and PD(ETCD) do not record these sources, it can only be said that TiDB,TiKV,BR,CDC may access it.

I believe there are not enough metrics emitted to investigate such scenarios:

  • There is no connection attempts metrics from TiDB
    • There is no connection open/closed metric tracked from PD side.

It is a complex historical legacy issue. In TiDB, some palace directly use pd-client to connect to PD, and we can see the corresponding monitoring information on TiDB->PD-Client.

However, in other places, for example the ddl part, etcd-client is used to connect to ETCD in PD directly, and currently we do not have the metrics for this part.

AndreMouche commented 1 year ago

https://github.com/tikv/pd/issues/4480 It seems that this issue involves some API rate limiting, but I couldn't find any official documentation on how to use this feature. @CabinfeverB could you provide some information?

Tema commented 1 year ago

, and we can see the corresponding monitoring information on TiDB->PD-Client

@AndreMouche TiDB->PD-Client does not have any connection related metrics like TiKV->PD does.

AndreMouche commented 1 year ago

@Tema For metrics TiKV->PD, it was on TiKV-details->PD-client

AndreMouche commented 1 year ago

related issue https://github.com/tikv/pd/issues/5739

CabinfeverB commented 1 year ago

After finishing #6834, we are exploring adaptive current-limiting mechanisms

rleungx commented 2 days ago

tracked by #5739