Why are gains normalized at each node?

Thank you for your great gem, Ilya!

We are currently working on a project, where we need to evaluate the importance of each attribute for a given decision tree. This is done by summing over the tree the gain for each attribute.

We observe that the results we get are quite different than results that are produced using the rpart package in R and the scikit-learn package in Python. This is explained by the fact that the gain is normalized at each node in the decisiontree gem (here and here).

Hence our question: is there any reason for normalizing the gain at each node in the decisiontree gem?

If not, we would like to make a pull request for computing a non-normalized gain - and, maybe later, for adding a function, which would compute the importance of each attribute.

igrigorik / decisiontree

Why are gains normalized at each node? #18