luminousmen / luminousmen.com

2 stars 0 forks source link

https://luminousmen.com/post/spark-tips-partition-tuning?utterances=739120e829e48599ec05dd75X0r1%2FZyTj9iscY%2FthYwzWsQZZNeCVByyuKDsBkLc8iIUQPBNmD1vxEIMpbOeYLqADjgKMlDVYA4BLb7SpEiyqPRS4NLTaA1lsps%2B2YeHhoP6hstNXCCV8jmHJWk%3D #60

Closed utterances-bot closed 1 month ago

utterances-bot commented 1 year ago

Spark Tips. Partition Tuning - Blog | luminousmen

Data partitioning is critical to data processing performance especially for large volumes of data processing in Spark. Here are some partitioning tips

https://luminousmen.com/post/spark-tips-partition-tuning?utterances=739120e829e48599ec05dd75X0r1%2FZyTj9iscY%2FthYwzWsQZZNeCVByyuKDsBkLc8iIUQPBNmD1vxEIMpbOeYLqADjgKMlDVYA4BLb7SpEiyqPRS4NLTaA1lsps%2B2YeHhoP6hstNXCCV8jmHJWk%3D

cnoam commented 1 year ago

I tried the unbalanced partitions code. It does not work as expected:

the 'transactions' has 1 partition after repartition('country'). the 'df' has 1 partition. Before calling repartition() it has 8. Tested on Spark 3.2.0 standalone.

Can you explain why?