Open utterances-bot opened 1 year ago
Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.
https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark?utterances=bddb2581deb400782aacd5aaYCpMY5iVPeN4KW91PC9W%2FrMI%2FDAQ7%2BlBpsVv41SOTC04RZmWZzOvMChIhWz73%2FaiGUGPe4lgIPX77fNYdvZ%2Br7fRcakNjNeE64Lef5FCNMBP5LP4lNuRIH9%2BRu4%3D
With the bucket created, we're looking on the physical plan of the join only but now that we are bucketing what is the complete physical plan including the code where the bucketes are created and sorted?
The 5-minute guide to using bucketing in Pyspark - Blog | luminousmen
Guide into Pyspark bucketing — an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.
https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark?utterances=bddb2581deb400782aacd5aaYCpMY5iVPeN4KW91PC9W%2FrMI%2FDAQ7%2BlBpsVv41SOTC04RZmWZzOvMChIhWz73%2FaiGUGPe4lgIPX77fNYdvZ%2Br7fRcakNjNeE64Lef5FCNMBP5LP4lNuRIH9%2BRu4%3D