luminousmen / luminousmen.com

2 stars 0 forks source link

https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark?utterances=bddb2581deb400782aacd5aaYCpMY5iVPeN4KW91PC9W%2FrMI%2FDAQ7%2BlBpsVv41SOTC04RZmWZzOvMChIhWz73%2FaiGUGPe4lgIPX77fNYdvZ%2Br7fRcakNjNeE64Lef5FCNMBP5LP4lNuRIH9%2BRu4%3D #55

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

The 5-minute guide to using bucketing in Pyspark - Blog | luminousmen

Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.

https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark?utterances=bddb2581deb400782aacd5aaYCpMY5iVPeN4KW91PC9W%2FrMI%2FDAQ7%2BlBpsVv41SOTC04RZmWZzOvMChIhWz73%2FaiGUGPe4lgIPX77fNYdvZ%2Br7fRcakNjNeE64Lef5FCNMBP5LP4lNuRIH9%2BRu4%3D

saidbouras commented 1 year ago

With the bucket created, we're looking on the physical plan of the join only but now that we are bucketing what is the complete physical plan including the code where the bucketes are created and sorted?