Open mtomaszewski95 opened 4 years ago
Feature importances based on node/split statistics are rather flawed (see e.g. this paper). Therefore, I'm hesitant to implement this feature. In particular, you can already compute permutation-based feature importance via ELI5. It is more expensive to compute, but has better properties.
My vote would be for adding the feature, at the very least for compatibility with scikit-learn.
sklearn has now, which is the much better option.
Yes, thanks! I understand your point of view, and that there are alternative ways to compute importance.
Still, even if it's not an ideal algorithm, it can still be nice to have. Some things presume feature_importances_
is available (e.g. RFECV) and not having it might add a little friction for new scikit-survival users already familiar with scikit-learn. It's also a lot faster which can be helpful during early iteration.
Thanks for the package and thanks for considering! :)
I also have a use-case where I am only interested in which feature are used or not used. For that, the feature importances based on node/split statistics could do the job and would be quick to calculate. In contrast, the calculation of permutation feature importances takes so much longer.
Thanks a lot for this package and your work.
Feature importances based on split criteria have been requested in the past. Unfortunately, the way sklearn implemented feature importance in the tree-growing algorithm doesn't work with the log-rank criteria used to grow the survival tree. The log-rank criteria measures the quality of the split, but sklearn assumes feature importance measure the purity of a node.
Implement featureimportances in sksurv.ensemble.RandomSurvivalForest. Examples: