A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
After running the C4.5 algorithm I get negative values for feature importance, I find a bit confusing as feature importance is calculated essentially as: the total amount of entropy from the parent node that the current node split can explain or "organize" into more homogenic nodes. If I am getting a negative feature importance it means that the node split is essentially creating entropy or less homogeneity which makes no sense. On top of that, these features, with negative importances, are being used in the rules.py.
Additionally, there is evidence in the literature that "Every time a node is split on variable, the combined impurity for the two descendent nodes is less than the parent node." Unbiased Measurement of Feature Importance in Tree-Based Methods - Split Improvement
Any chance you could explain this behavior?
DISCLAIMER: I can't share the data, as it is private.
After running the C4.5 algorithm I get negative values for feature importance, I find a bit confusing as feature importance is calculated essentially as: the total amount of entropy from the parent node that the current node split can explain or "organize" into more homogenic nodes. If I am getting a negative feature importance it means that the node split is essentially creating entropy or less homogeneity which makes no sense. On top of that, these features, with negative importances, are being used in the rules.py. Additionally, there is evidence in the literature that "Every time a node is split on variable, the combined impurity for the two descendent nodes is less than the parent node." Unbiased Measurement of Feature Importance in Tree-Based Methods - Split Improvement Any chance you could explain this behavior? DISCLAIMER: I can't share the data, as it is private.