microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.51k stars 3.82k forks source link

Feature Cycling as an option instead of Random Feature Sampling #4066

Closed JoshuaC3 closed 3 years ago

JoshuaC3 commented 3 years ago

Summary

Have the option such that the model can select features cyclically, instead of simply randomly selecting the features.

See here for initial discussion on LightGBMs and EBMs.

Motivation

Model explain ability is becoming ever more important in the ML space. LightGBM can take advantage of some of the methods used by Explainable Boosted Machines to make models more interpretable. One of the features of EBMs is build shallow, single-feature trees (currently possible in LightGBM by toying with parameters). However, these trees are boosted in a cyclic fashion.

So, for example,

image

So for a model with 3-features:

Tree 1 - feature 1 Tree 2 - feature 2 Tree 3 - feature 3 Tree 4 - feature 1 Tree 5 - feature 2 Tree 6 - feature 3 ...

This allows the model to ensure it gains information from colinear features that might be equally as important. When comparing this to a small number of deeper trees, it is easy to get a bias (lots of gain) in the first few features the are randomly selected.

Description

The feature would be used to make LightGBM more interpretable and results more comparable to EBMs. This will allow users to make informed decision on interpretability vs model performance.

Additionally, in certain cases, the model maybe be more robust at inference time if colinear features are missing.

References

A great conceptual video explanation.

InterpretML: A Unified Framework for Machine Learning Interpretability InterpretML: A toolkit for understanding machine learning models InterpretMLs Explainable Boosting Machine

StrikerRUS commented 3 years ago

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.