luminousmen / luminousmen.com

2 stars 0 forks source link

https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark #18

Open utterances-bot opened 4 years ago

utterances-bot commented 4 years ago

The 5-minute guide to using bucketing in Pyspark - Blog | luminousmen

Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.

https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark

setjmp commented 4 years ago

What release of Pyspark is this written for?

luminousmen commented 4 years ago

@setjmp at the moment it was 2.4

setjmp commented 4 years ago

The key detail is bucketBy became available in 2.3.