shakedzy / dython

A set of data tools in Python
http://shakedzy.xyz/dython/
MIT License
496 stars 102 forks source link

Cramer vs. Theil #146

Closed ronyarmon closed 1 year ago

ronyarmon commented 1 year ago

Hi, I'm searching for correlations in a very small dataset (26 samples): Screenshot from 2023-02-08 16-36-43 When calculating the associations I'm getting very different results for the Cramer and Theil tests.

print('Cramer correlations')
cramer_plot = associations(df_cat, nom_nom_assoc='cramer',filename= 'complete_correlation.png',\
                                    figsize=(4,4))
print('Theil correlations')
theil_plot = associations(df_cat, nom_nom_assoc='theil', filename= 'theil_correlation.png', figsize=(4,4))

Screenshot from 2023-02-08 16-45-07

Is the difference in their absolute values to be expected? Which test is more valid? I would tend to use Cramer since it's built on the chi-square test which is broadly accepted but wonder if I should use Theil due to the (slight) features asymmetry. I've read the post explaining these methods and wonder what would you recommend in such cases

shakedzy commented 1 year ago

Hey @ronyarmon - I recommend to read the documentation about these two measures. They're different and cannot be compared. If you ask me, I wouldn't choose Cramer's V on such a small dataset. See the method's documentation for more on this.

ronyarmon commented 1 year ago

Thanks for your advice, I didn't see a reference to dataset size but will keep exploring