jinlow / forust

A lightweight gradient boosted decision tree package.
https://jinlow.github.io/forust/
Apache License 2.0
56 stars 6 forks source link

Speed up Shapley calculations on Windows #91

Open jinlow opened 7 months ago

jinlow commented 7 months ago

For some reason, Shapley runs much slower on windows than Linux.

import forust
import xgboost as xgb
import seaborn as sns
import numpy as np

df = sns.load_dataset("titanic")
X = df.select_dtypes("number").drop(columns=["survived"]).astype("float32")
y = df["survived"]

model = xgb.XGBClassifier(
    objective="binary:logistic",
    max_depth=15,
    n_estimators=1000,
)
model.fit(X, y)

loaded = forust._from_xgboost_model(model)

And then running on linux...

%%timeit
loaded.predict_contributions(X, method="Shapley")

image

And then XGBoost.

%%timeit
model.get_booster().predict(
    xgb.DMatrix(X), pred_contribs=True, approx_contribs=False
)

image

And then on windows. image