One user reported that they found the pd CPU spike with large data in TiSpark v3.2.1.
The metric shows that a large number of getrgion RPC are requested in PD.
The metric shows that the pd CPU is high only at the begin 10 min of spark job.
According to the reports, I think TiSpark may call too much getrgion in the driver when it split the region task. We can use ScanRegions in driver to reduce to number of RPC call.
Enhancement
This enhancement has been proposed in https://github.com/pingcap/tispark/issues/959. This issue explains why we want to do it and what it can resolve.
One user reported that they found the pd CPU spike with large data in TiSpark v3.2.1.
getrgion
RPC are requested in PD.According to the reports, I think TiSpark may call too much
getrgion
in the driver when it split the region task. We can use ScanRegions in driver to reduce to number of RPC call.TODO:Test the effect of this enhancement