Open slavakurilyak opened 6 years ago
Yeah, I have thought about this idea last weeks.
But, we are predicting with a different XGBoost in every iteration (daily or minute). So, we need think in a strategy to get a classification of features' importance. Perhaps, we could try to do a mean of features' importance...?
Here's a modelling idea to consider:
Inspiration: MLWiz, 2017
@bukosabino Let's analyze feature importances. Skater's (see above) implementation is based on an "information theoretic criteria, measuring the entropy in the change of predictions, given a perturbation of a given feature. The intuition is that the more a model’s decision criteria depend on a feature, the more we’ll see predictions change as a function of perturbing a feature." (Skater, 2018)
Goals
As a machine learning developer, I want to explain the output of any machine learning model, so that I can better interpret predictions and classifications.
As a machine learning developer, I want to analyze feature importances, so that I can perform diagnostics on the machine learning model and better understand the degree to which a predictive model relies on a particular feature.
Consider
Consider using SHAP (SHapley Additive exPlanation), which claims "better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features." (Arxiv, 2018)
Consider using XGBoost's built-in plot_importance() method, which has importance_type (either “weight”, “gain”, or “cover”) or get_fscore() method
Consider using skater library, developed by datascienceinc for model interpretation
Inspiration
Source: New frontiers: Marcos Lopez de Prado on Machine Learning for finance, 2018