Closed Sandy4321 closed 5 years ago
I don't have time to vouch other people's code. There is an implementation in scikit-learn, which is what I would use to compute the MI between categorical variables.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html
I see thanks for soon answer it would be very kind of you to share some links to understand why https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html#sklearn.feature_selection.mutual_info_classif Kozachenko, N. N. Leonenko, is good for mutual information (they use it for feature selection) why it impossible just calculate similarity/mutual information for each variable (feature) and anther variable (target_? Then if similarity/mutual information for given feature and target is high then this feature is good to use?? seems to be I can not understand something conceptual about Kozachenko, N. N. Leonenko mutual information may you share some link to simple plain python code to example for Kozachenko, N. N. Leonenko mutual information, pls
Sandy, you are asking me to comment on code that I haven't even read, much less written myself. You should really head over to the statistics or signal processing stackexchange.
That being said, I think it is a terrible idea to use the Leonenko estimator for discrete data (it becomes unstable if any distances are close to zero, and for discrete variables, many distances may indeed be zero). If you want to understand how the estimator works, I would recommend the
Kraskov, H. Stogbauer and P. Grassberger, “Estimating mutual information”. Phys. Rev. E 69, 2004.
paper. It is very accessible. Both, the Leonenko estimator for entropy and the Kraskov estimator for MI are implemented in my code. So you can look up an implementation there.
I found you code from this link https://stackoverflow.com/questions/43265770/entropy-python-implementation
but I need to estimate mutual info between categorical values to find similar features to use in https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_coclustering.html#sphx-glr-auto-examples-bicluster-plot-spectral-coclustering-py
bicluster it using the Spectral Co-Clustering algorithm in this link https://www.researchgate.net/post/How_do_I_compute_the_Mutual_Information_MI_between_2_or_more_features_in_Python_when_the_data_are_not_necessarily_discrete
recommended to use https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html
Estimate mutual information for a discrete target variable may you share what python simple code may be used?