julioasotodv / spark-df-profiling

Create HTML profiling reports from Apache Spark DataFrames
MIT License
195 stars 77 forks source link

showing Incorrect Missing data in HTML Report #25

Open harika1419 opened 5 years ago

harika1419 commented 5 years ago

After generating the HTML report using spark-df- profiling It is showing the percentage of Missing data as 0%.

Even though dataframe has some missing data

adutchengineer commented 5 years ago

Could you give an example?

shhanani commented 4 years ago

Is this fixed yet? mine also shows wrong missing data as 0%

shhanani commented 4 years ago

@harika1419 I think I found the issue. It's in line 397. Change to this:

results_data = df.select(column).na.drop().agg(countDistinct(col(column)).alias("distinct_count"),
                                                       count(col(column)).alias("count")).toPandas()

@julioasotodv you might need to look at this solution

harika1419 commented 4 years ago

Hi... That issue was fixed after upgrading the spark from 1.6 to 2.3.3

shhanani commented 4 years ago

Hi @harika1419, Thanks for informing. I'm facing this issue while using spark 2.4.2, that is why I thought its not fixed yet.

Strauman commented 3 years ago

I'm on Spark 3.1.0, and it's showing wrong. Also the number of zeros are wrong.