Open jinglinpeng opened 3 years ago
Hi @jinglinpeng I suggest that you add Phik correlation too,
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution.
Here is extensive documentation available here https://phik.readthedocs.io/en/latest/.
Thanks @Abdelgha-4 for the suggestion! Indeed we once considered the PhiK correlation at https://github.com/sfu-db/dataprep/pull/145. However, PhiK is generally very slow comparing to other correlations so we decide to defer the implementation until someone thinks this is really needed.
I see! sorry then, I didn't notice it was already discussed.
I see! sorry then, I didn't notice it was already discussed.
No worries! If you think this is an important feature then we can certainly add it.
Is your feature request related to a problem? Please describe. Currently the plot_correlation only works for numerical variable. This issue extends
plot_correlation
to support categorical variable.Describe the solution you'd like
plot_correlation(df)
: Add Cramer V correlation matrix for all categorical columns Time: 2021.01.20-2021.01.27plot_correlation(df, x = cat)
: Add Cramer V correlation for categorical columns. Time: 2021.01.27-2021.02.03Reference:
Describe alternatives you've considered NA Additional context NA