A scorecard-format classificatory framework for logistic regression with XGBoost. xbooster allows to convert an XGB logistic regression into a logarithmic (point) scoring system.
In addition, it provides a suite of interpretability tools to understand the model's behavior, which can be instrumental for model testing and expert validation.
The interpretability suite includes:
xbooster also provides a scorecard deployment using SQL ๐ฆ.
Install the package using pip:
pip install xbooster
Here's a quick example of how to use xbooster to construct a scorecard for an XGBoost model:
import pandas as pd
import xgboost as xgb
from xbooster.constructor import XGBScorecardConstructor
from sklearn.model_selection import train_test_split
# Load data and train XGBoost model
url = (
"https://github.com/xRiskLab/xBooster/raw/main/examples/data/credit_data.parquet"
)
dataset = pd.read_parquet(url)
features = [
"external_risk_estimate",
"revolving_utilization_of_unsecured_lines",
"account_never_delinq_percent",
"net_fraction_revolving_burden",
"num_total_cc_accounts",
"average_months_in_file",
]
target = "is_bad"
X, y = dataset[features], dataset[target]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train the XGBoost model
best_params = {
'n_estimators': 100,
'learning_rate': 0.55,
'max_depth': 1,
'min_child_weight': 10,
'grow_policy': "lossguide",
'early_stopping_rounds': 5
}
model = xgb.XGBClassifier(**best_params, random_state=62)
model.fit(X_train, y_train)
# Initialize XGBScorecardConstructor
scorecard_constructor = XGBScorecardConstructor(model, X_train, y_train)
scorecard_constructor.construct_scorecard()
# Print the scorecard
print(scorecard_constructor.scorecard)
After this, we can create a scorecard and test its Gini score:
from sklearn.metrics import roc_auc_score
# Create scoring points
xgb_scorecard_with_points = scorecard_constructor.create_points(
pdo=50, target_points=600, target_odds=50
)
# Make predictions using the scorecard
credit_scores = scorecard_constructor.predict_score(X_test)
gini = roc_auc_score(y_test, -credit_scores) * 2 - 1
print(f"Test Gini score: {gini:.2%}")
We can also visualize the score distribution between the events of interest.
from xbooster import explainer
explainer.plot_score_distribution(
y_test,
credit_scores,
num_bins=30,
figsize=(8, 3),
dpi=100
)
We can further examine feature importances.
Below, we can visualize the global feature importances using Points as our metric:
from xbooster import explainer
explainer.plot_importance(
scorecard_constructor,
metric='Points',
method='global',
normalize=True,
figsize=(3, 3)
)
Alternatively, we can calculate local feature importances, which are important for boosters with a depth greater than 1.
explainer.plot_importance(
scorecard_constructor,
metric='Likelihood',
method='local',
normalize=True,
color='#ffd43b',
edgecolor='#1e1e1e',
figsize=(3, 3)
)
Finally, we can generate a scorecard in SQL format.
sql_query = scorecard_constructor.generate_sql_query(table_name='my_table')
print(sql_query)
xbooster.constructor
- XGBoost Scorecard ConstructorA class for generating a scorecard from a trained XGBoost model. The methodology is inspired by the NVIDIA GTC Talk "Machine Learning in Retail Credit Risk" by Paul Edwards.
extract_leaf_weights() -> pd.DataFrame
:
pd.DataFrame
: DataFrame containing the extracted leaf weights.extract_decision_nodes() -> pd.DataFrame
:
pd.DataFrame
: DataFrame containing the extracted split (decision) nodes.construct_scorecard() -> pd.DataFrame
:
pd.DataFrame
: The constructed scorecard.create_points(pdo=50, target_points=600, target_odds=19, precision_points=0, score_type='XAddEvidence') -> pd.DataFrame
:
pdo
(int, optional): The points to double the odds. Default is 50.target_points
(int, optional): The standard scorecard points. Default is 600.target_odds
(int, optional): The standard scorecard odds. Default is 19.precision_points
(int, optional): The points decimal precision. Default is 0.score_type
(str, optional): The log-odds to use for the points card. Default is 'XAddEvidence'.pd.DataFrame
: The points card.predict_score(X: pd.DataFrame) -> pd.Series
:
X
(pd.DataFrame
): Features of the dataset.pd.Series
: Predicted scores.sql_query
(property):
str
: The SQL query for deploying the scorecard.generate_sql_query(table_name: str = "my_table") -> str
:
table_name
(str): The name of the input table in SQL.str
: The final SQL query for deploying the scorecard.xbooster.explainer
- XGBoost Scorecard ExplainerThis module provides functionalities for explaining XGBoost scorecards, including methods to extract split information, build interaction splits, visualize tree structures, plot feature importances, and more.
extract_splits_info(features: str) -> list
:
features
(str): A string containing split information.build_interactions_splits(scorecard_constructor: Optional[XGBScorecardConstructor] = None, dataframe: Optional[pd.DataFrame] = None) -> pd.DataFrame
:
scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.dataframe
(Optional[pd.DataFrame]): The dataframe containing split information.split_and_count(scorecard_constructor: Optional[XGBScorecardConstructor] = None, dataframe: Optional[pd.DataFrame] = None, label_column: Optional[str] = None) -> pd.DataFrame
:
scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.dataframe
(Optional[pd.DataFrame]): The dataframe containing features and labels.label_column
(Optional[str]): The label column in the dataframe.plot_importance(scorecard_constructor: Optional[XGBScorecardConstructor] = None, metric: str = "Likelihood", normalize: bool = True, method: Optional[str] = None, dataframe: Optional[pd.DataFrame] = None, **kwargs: Any) -> None
:
scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.metric
(str): Metric to plot ("Likelihood" (default), "NegLogLikelihood", "IV", or "Points").normalize
(bool): Whether to normalize the importance values (default: True).method
(Optional[str]): The method to use for plotting the importance ("global" or "local").dataframe
(Optional[pd.DataFrame]): The dataframe containing features and labels.fontfamily
(str): The font family to use for the plot (default: "Monospace").fontsize
(int): The font size to use for the plot (default: 12).dpi
(int): The DPI of the plot (default: 100).title
(str): The title of the plot (default: "Feature Importance").**kwargs
(Any): Additional Matplotlib parameters.plot_score_distribution(y_true: pd.Series = None, y_pred: pd.Series = None, n_bins: int = 25, scorecard_constructor: Optional[XGBScorecardConstructor] = None, **kwargs: Any)
:
y_true
(pd.Series): The true labels.y_pred
(pd.Series): The predicted labels.n_bins
(int): Number of bins for histogram (default: 25).scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.**kwargs
(Any): Additional Matplotlib parameters.plot_local_importance(scorecard_constructor: Optional[XGBScorecardConstructor] = None, metric: str = "Likelihood", normalize: bool = True, dataframe: Optional[pd.DataFrame] = None, **kwargs: Any) -> None
:
scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.metric
(str): Metric to plot ("Likelihood" (default), "NegLogLikelihood", "IV", or "Points").normalize
(bool): Whether to normalize the importance values (default: True).dataframe
(Optional[pd.DataFrame]): The dataframe containing features and labels.fontfamily
(str): The font family to use for the plot (default: "Arial").fontsize
(int): The font size to use for the plot (default: 12).boxstyle
(str): The rounding box style to use for the plot (default: "round").title
(str): The title of the plot (default: "Local Feature Importance").**kwargs
(Any): Additional parameters to pass to the matplotlib function.plot_tree(tree_index: int, scorecard_constructor: Optional[XGBScorecardConstructor] = None, show_info: bool = True) -> None
:
tree_index
(int): Index of the tree to plot.scorecard_constructor
(Optional[XGBScorecardConstructor]): The XGBoost scorecard constructor.show_info
(bool): Whether to show additional information (default: True).**kwargs
(Any): Additional Matplotlib parameters.Contributions are welcome! For bug reports or feature requests, please open an issue.
For code contributions, please open a pull request.
Current version: 0.2.2
explainer.py
)explainer.py
)_utils.py
)explainer.py
module to improve kwargs handling and minor changes.This project is licensed under the MIT License - see the LICENSE file for details.