scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Acces Z-score individual variables #37

Closed prubbens closed 5 years ago

prubbens commented 6 years ago

Is it possible to access the individual Z-scores of variables? Such as to make a visualization that has been done in Fig. 2 of the original paper. image

prubbens commented 6 years ago

I am currently using the imphistory variable (and added it as an attribute of the base Boruta opbject). I was wondering, is this the history of importances for every variable, or the difference in importance with it's shadow variables?

kkpsiren commented 6 years ago

I'm also struggling with this. It would be awesome to have access to the z-scores

France1 commented 6 years ago

It seems that imp_history_variable is the feature importance of the classifier method (i.e. the random forest), for the real variable only and not the shadow one:

cur_imp is given by_add_shadows_get_imps which returns [imp_real, imp_sha] where imp correspond to estimator.featureimportances

then cur_imp[0]=imp_real is appended to imp_hist

I guess it can be possible to calculate the confidence intervals if you loop through the individual tree of the random forests as described here http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html