ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.55k stars 1.69k forks source link

Supporting Spark Connect Dataframe #1634

Open chenbojian opened 3 months ago

chenbojian commented 3 months ago

Missing functionality

After databricks runtime 14, the dataframe type is changed in notebook. It was pyspark.sql.dataframe.DataFrame, but now it is pyspark.sql.connect.dataframe.DataFrame it fails to work with ydata-profling because ydata-profiling expects either pandas.DataFrame or pyspark.sql.dataframe.DataFrame

Proposed feature

Support pyspark.sql.connect.dataframe.DataFrame for profiling

Alternatives considered

No response

Additional context

image
charleslondon commented 1 month ago

Bumping this as I also have this issue

dan-eschman commented 2 weeks ago

Bump - same issue here. It would be great if I didn't have to go to pandas first.