Closed utterances-bot closed 1 month ago
I tried the unbalanced partitions code. It does not work as expected:
the 'transactions' has 1 partition after repartition('country'). the 'df' has 1 partition. Before calling repartition() it has 8. Tested on Spark 3.2.0 standalone.
Can you explain why?
Spark Tips. Partition Tuning - Blog | luminousmen
Data partitioning is critical to data processing performance especially for large volumes of data processing in Spark. Here are some partitioning tips
https://luminousmen.com/post/spark-tips-partition-tuning?utterances=739120e829e48599ec05dd75X0r1%2FZyTj9iscY%2FthYwzWsQZZNeCVByyuKDsBkLc8iIUQPBNmD1vxEIMpbOeYLqADjgKMlDVYA4BLb7SpEiyqPRS4NLTaA1lsps%2B2YeHhoP6hstNXCCV8jmHJWk%3D