Open ibobak opened 5 months ago
Hi @ibobak ,
thank you for reporting the issue. Regarding ydata-profiling for spark it is clear that we have only launched one initial version that not only includes only a small set of functionality but also have some know issues.
We are looking for contributors that are willing to keep evolving the Spark integration, as this was something initiated by the community. If you're open to it, feel free to check the issues labelled with the tag spark
.
Current Behaviour
Spark Dataframe structure:
code:
Look what distribution it produced for playtime_sec_total:
Now I converted this dataframe to the Pandas dataframe and here is what I see indeed:
So, conclusion is this: the product is totally buggy with this type of fields, and I don't trust it any more.
Expected Behaviour
You need to fix the handling of decimal fields.
Data Description
see above
Code that reproduces the bug
pandas-profiling version
ydata-profiling==4.8.3
Dependencies
OS
Ubuntu 22.04
Checklist