Closed danielenricocahall closed 4 years ago
Hi @danielenricocahall,
I originally decided to generate my own logic for the histogram (using the DF API) since in my tests it was faster than the built-in histogram for RDDs.
Have you tried in some mid/large dataset to see if ir actually works faster with this PR?
Simplified the numeric histogram function by using rdd's 'histogram' built in function.