julioasotodv / spark-df-profiling

Create HTML profiling reports from Apache Spark DataFrames
MIT License
195 stars 77 forks source link

Simplify numeric histogram function #31

Closed danielenricocahall closed 4 years ago

danielenricocahall commented 5 years ago

Simplified the numeric histogram function by using rdd's 'histogram' built in function.

julioasotodv commented 4 years ago

Hi @danielenricocahall,

I originally decided to generate my own logic for the histogram (using the DF API) since in my tests it was faster than the built-in histogram for RDDs.

Have you tried in some mid/large dataset to see if ir actually works faster with this PR?