Closed wangdabin1216 closed 5 years ago
there's no split-by
in spark as i know, instead you can use Coalesce
, see https://medium.com/@mrpowers/managing-spark-partitions-with-coalesce-and-repartition-4050c57ad5c4
@marsishandsome
I tried to use Coalesce=32, but this only controls the output of the program, but my purpose is to read it evenly from the tidb, I am worried about whether there will be data skew or OOM problems.
tispark already solved data skew
problem, you do not need do anything, just use it.
thx I will have a try
I tried to use tispark instead of sqoop to draw numbers from tidb to hdfs. Is there a split-by like sqoop, how to control? Thank you