Can ExplainableBoostingRegressor be used as a feature engineering tool?

interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.

https://interpret.ml/docs

MIT License

6.14k stars 723 forks source link

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

Open jckkvs opened 10 months ago

jckkvs commented 10 months ago

Can ExplainableBoostingRegressor or ExplainableBoostingClassifier be used as a transformer to extract interactions between explanatory variables? For example, considering the interaction between X1 and X3.

paulbkoch commented 10 months ago

Hi @jckkvs -- The ExplainableBoostingClassifier and ExplainableBoostingRegressor are not transformers, so you cannot do this directly, but you can achieve it through other means.

The hard part about interactions is narrowing the possible interactions down to just the most important ones. We have exposed the "measure_interactions" function which returns an interaction strength that we use internally to choose pairs. Here's a link to the docs on measure_interactions: https://interpret.ml/docs/measure_interactions.html. Generally, you would want to first train a model on the individual features to extract as much information from them as possible before moving to the pairs. measure_interactions accepts an init_score parameter for this.

Once the interactions are chosen, we then bin them using quantiles. You can use the same binning algorithm as used in EBMs by using the EBMPreprocessor. It isn't public, but you can find it here: https://github.com/interpretml/interpret/blob/e6f38ea195aecbbd9d28c7183a83c65ada16e1ae/python/interpret-core/interpret/utils/_preprocessor.py#L74

jckkvs commented 9 months ago

@paulbkoch Thank you. I apologize for the vagueness of my question due to my lack of understanding.

In my previous questions, I mentioned interactions between variables, but for a simpler case where interactions are not considered, I've come to understand the following:

The EBMPreprocessor(binning="quantile") that you mentioned has the effect of transforming various distribution shapes of X into a uniform distribution.

I will deepen my understanding of interactions by reviewing the documentation and code you provided. Thank you.

jckkvs commented 9 months ago

@paulbkoch

I understand that the EBMPreprocessor(binning="quantile") essentially performs the same function as sklearn.preprocessing.QuantileTransformer(output_distribution='uniform'). (Of course, I anticipate some minor differences depending on the settings of n_quantiles and the dataset in use.)

Is my understanding correct?"

Below is a simple code to verify the above understanding.

from  sklearn.preprocessing import QuantileTransformer
from interpret.utils._preprocessor import EBMPreprocessor
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
X,y = make_regression()
transformers = [QuantileTransformer(), EBMPreprocessor()]
X_transformed_ = []
for transformer in transformers:
    transformer.fit(X,y)
    X_transformed = transformer.transform(X)
    X_transformed_.append(X_transformed)

plt.scatter(X_transformed_[0],X_transformed_[1])

paulbkoch commented 9 months ago

Same idea in terms of quantiles, although QuantileTransformer returns floats and EBMPreprocessor returns binned integer values, and also handles missing values.