Open hakan-77 opened 5 months ago
I was looking into this a bit as I was running into the issue. It's something with pandas going from 2.0.3 to 2.1.x. For ydata-profiling v.4.6.4 it works fine with pandas v2.0.3 but once you upgrade to pandas v2.1.x the autocorrelation stops working. Won't claim to know what in pandas is causing the break, but if you downgrade to pandas 2.0.3 it'll work again.
@driscoll42 good catch. I can confirm that the reason 4.6.2 works is that it pins pandas < 2.1. The below pr relaxed pandas pin and thus broke correlations.
https://github.com/ydataai/ydata-profiling/pull/1512
@aquemy @ricardodcpereira any idea what could be wrong?
@aquemy @ricardodcpereira is there anything I can help with?
Could it be to the newer pandas datatypes. There are now nullable datatypes for string, float etc. with pandas.NA as missing values.
I get many issues where data attempts to convert sting to float:
include the error message: 'could not convert string to float: 'positive''
include the error message: 'could not convert string to float: `'positive''
Maybe after pandas 2.0, we need to add numeric_only = True
in pandas.Dataframe.corr()
Changed in version 2.0.0: The default value of numeric_only is now False.
I believe this line will also have to be updated to this or its equivalent:
method = (
_pairwise_spearman
if col_1_name not in categorical_columns and col_2_name not in categorical_columns
else _pairwise_cramers
)
Setting numeric_only = True
and making the above change ensures the report renders with both categorical an numerical features; otherwise it throws a TypeError
on categorical columns if they show up as col_1_name
.
Current Behaviour
Trying to create a profile with default settings, correlations do not work for some relatively simple data sets with the below error:
I think this issue started with 4.6.3 and is still the case for 4.6.4. EDIT: I can confirm that downgrading to 4.6.2 solves the issue.
Expected Behaviour
Correlations work
Data Description
Standard boston data set
Code that reproduces the bug
pandas-profiling version
v4.6.4
Dependencies
OS
Ubuntu 22
Checklist