Open utterances-bot opened 4 years ago
Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.
https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark
What release of Pyspark is this written for?
@setjmp at the moment it was 2.4
The key detail is bucketBy became available in 2.3.
The 5-minute guide to using bucketing in Pyspark - Blog | luminousmen
Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.
https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark