When we need use API of coalesce()?

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

http://mapreduce4hackers.com

Other

1.07k stars 666 forks source link

Closed lxlenovostar closed 7 years ago

lxlenovostar commented 7 years ago

hi, I notice you just use coalesce() in Top10NonUnique.java not in Top10.java. This is Why?

hank you for the reply.

mahmoudparsian commented 7 years ago

Spark's coalesce() is used to control partitioning and parallelism

In Spark API, we can observe:

public JavaRDD coalesce(int numPartitions) // Return a new RDD that is reduced into numPartitions partitions.

Thanks, best regards, Mahmoud Parsian