mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book
http://mapreduce4hackers.com
Other
1.07k stars 666 forks source link

When we need use API of coalesce()? #15

Closed lxlenovostar closed 7 years ago

lxlenovostar commented 7 years ago

hi, I notice you just use coalesce() in Top10NonUnique.java not in Top10.java. This is Why?

hank you for the reply.

mahmoudparsian commented 7 years ago

Spark's coalesce() is used to control partitioning and parallelism

In Spark API, we can observe:

public JavaRDD coalesce(int numPartitions) // Return a new RDD that is reduced into numPartitions partitions.

Also, please look at here: http://stackoverflow.com/questions/31610971/spark-repartition-vs-coalesce

Thanks, best regards, Mahmoud Parsian