pingcap / tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV
Apache License 2.0
880 stars 244 forks source link

Using PD's ScanRegions API to resolve pd cpu spike #2708

Open shiyuhang0 opened 1 year ago

shiyuhang0 commented 1 year ago

Enhancement

This enhancement has been proposed in https://github.com/pingcap/tispark/issues/959. This issue explains why we want to do it and what it can resolve.

One user reported that they found the pd CPU spike with large data in TiSpark v3.2.1.

  1. The metric shows that a large number of getrgion RPC are requested in PD.
  2. The metric shows that the pd CPU is high only at the begin 10 min of spark job.

According to the reports, I think TiSpark may call too much getrgion in the driver when it split the region task. We can use ScanRegions in driver to reduce to number of RPC call.

TODO:Test the effect of this enhancement