mi2-warsaw / FSelectorRcpp

Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support
http://mi2-warsaw.github.io/FSelectorRcpp/
35 stars 15 forks source link

Information gain equation in the documentation. #85

Closed piotr-ole closed 4 years ago

piotr-ole commented 4 years ago

I have a question about the information gain equation from the FSelectorRcpp documentation. The definition of the information gain from the documentation is: " H(Class) +H(Attribute)−H(Class, Attribute)

where H(X) is Shannon's Entropy for a variable X and H(X, Y) is a conditional Shannon's Entropy for a variable X with a condition to Y. " But this doesn't fit in any form of information gain defitnition I found. I think the H(X, Y) is not a contional Shannon's Entropy but joined entropy.

According to wikipedia: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees https://en.wikipedia.org/wiki/Conditional_entropy I think, it should be written as: IG = H(Y) - H(Y|X) = H(Y) + H(X) - H(X,Y) but here H(X,Y) = H(Y,X) is joined entropy of X and Y. Is there a mistake in docs/implementation or I am getting it wrong?

MarcinKosinski commented 4 years ago

Hi @piotr-ole thanks for the message. Indeed this should be joint entropy. Not sure how I put together the conditional 3 years ago when I made this commit https://github.com/mi2-warsaw/FSelectorRcpp/commit/13783932b3a124c2a3b7a31e1cb81bcf2c3edd73

Feel free to provide a PR with the updated documentation if you like. If not, I can update that in a couple of days!

MarcinKosinski commented 4 years ago

I think I have the very same mistake here https://github.com/MarcinKosinski/MarcinKosinski.github.io/blob/master/_source/2017-01-11-Entropy-Based-Image-Binarization.Rmd#L186

pat-s commented 3 years ago

@MarcinKosinski An addition to this documentation discussion:

I think it could help to link the reference of Shannons entropy (Shannon 1984) to the help page. Also currently it is now fully clear that H(X) expands into Shannons entropy as shown here.

I was asked to provide more references in a paper because the reviewer questioned the formula for information gain (and I assume he wanted to see Shannons Entropy in some sort).