stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 215 forks source link

Issue #181 - fix feature importance when col_sample < 1 #183

Closed samshipengs closed 3 years ago

samshipengs commented 3 years ago

This PR attempts to fix the issue mentioned in #181 , where when col_sample was used (i.e. < 1) the feature importance was only grabbing for the columns that were selected in building each trees. All columns feature importance should be retrieved.

Approach: self.col_idxs contains the column indices that were sampled when building each trees, so we can get the feature importance of these sampled columns and then map back to the total features.

Test: Did not perform unittest etc., if it's highly preferred please let me know and kindly provide some guidelines. So for basic testing, I have run experiments with col_sample range from really low values e.g. 0.1 to 0.9 and 1., in combination with different number of n_estimators range from 10 ~ 1000, and the code finished without error and the feature importance matches with my intuition on my dataset.