interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.04k stars 715 forks source link

Feature Request: Cooperative Learning via Custom Objective or Kind #486

Open fsaforo1 opened 6 months ago

fsaforo1 commented 6 months ago

@interpret-ml @paulbkochms

First off, thanks for building this amazing tool!

The Request

I am interested in exploring the implementation of cooperative learning in EBMs through a specialized loss objective. This objective would allow EBMs to learn from an ensemble of additive models, each corresponding to different feature sets, and encourage these models to work in a cooperative manner.

Practical Example: Air Quality and Public Health Modeling

Scenario: Environmental scientists are tasked with assessing the influence of air pollution on public health within urban settings. They collate data from diverse streams:

Meteorological patterns are known to modulate pollutant dispersal and concentrations, which in turn have direct consequences on health outcomes. Socioeconomic factors further modulate a population's exposure and susceptibility to pollution-related health risks.

Proposed Objective Function

I propose a cooperative loss objective to be optimized, as follows, considering the first two views for simplicity:

$$ \min_{f, g} \frac{1}{2} \sum_i \left(y_i - \sum_j fj(A{ij}) - \sum_k gk(B{ik})\right)^2 + \frac{\rho}{2} \sum_i \left(\sum_j fj(A{ij}) - \sum_k gk(B{ik})\right)^2 $$

Where:

Implication of the $\rho$ Parameter: The parameter $\rho$ is essential for tuning the degree of cooperation between the different data views:

What the $\rho$ parameter could be doing during training The $\rho$ parameter essentially the period in learning where learnings from different views can be combined. For example:

Some Thoughts on Potential Implementation

import numpy as np
from interpret.glassbox import ExplainableBoostingRegressor
from scipy.optimize import minimize
from sklearn.base import BaseEstimator, RegressorMixin

class CooperativeMultiViewEBMRegressor(BaseEstimator, RegressorMixin):
    def __init__(self, rho=0.5, view1_params={}, view2_params={}):
        self.rho = rho
        self.view1_params = view1_params
        self.view2_params = view2_params
        self.model1 = ExplainableBoostingRegressor(**self.view1_params)
        self.model2 = ExplainableBoostingRegressor(**self.view2_params)
        self.weights_ = None

    def fit(self, X1, X2, y):
        # Fit the individual EBMs to each view
        self.model1.fit(X1, y)
        self.model2.fit(X2, y)

        # Initial prediction to calculate initial weights
        pred1 = self.model1.predict(X1)
        pred2 = self.model2.predict(X2)

        # Define the cooperative loss function
        def cooperative_loss(weights):
            combined_pred = weights[0] * pred1 + weights[1] * pred2
            agreement_term = self.rho * np.sum((pred1 - pred2)**2)
            return np.sum((y - combined_pred)**2) + agreement_term

        # Initial weights (evenly distributed)
        initial_weights = np.array([0.5, 0.5])

        # Minimize the cooperative loss to find the best weights
        result = minimize(cooperative_loss, initial_weights, method='L-BFGS-B', bounds=[(0, 1), (0, 1)])
        self.weights_ = result.x

        return self

    def predict(self, X1, X2):
        # Predict using the individual EBMs
        pred1 = self.model1.predict(X1)
        pred2 = self.model2.predict(X2)

        # Combine predictions using the optimized weights
        return self.weights_[0] * pred1 + self.weights_[1] * pred2

Some Practical Rationale for Cooperative Learning

Utilizing cooperative learning, researchers can harness various data views for enhanced predictions and insights. In the context of our air quality problem, objectives include:

paulbkoch commented 6 months ago

Hi @fsaforo1 -- This is very interesting. To be honest, I don't fully understand the implications of this cooperative objective vs training all the features in a single model. If you're interested in meeting with us and discussing it, send us an email at interpret@microsoft.com

Just a couple of quick thoughts that come to mind: 1) Unless you are using an identity link function, you probably want to apply the link function to the predictions returned from self.model.predict, then find the optimized weights in the additive domain. During predict you'll want to apply the link function again, re-weight the predictions, then reapply the inverse link function. 2) mergeebms does indeed currently require identical feature sets, but it does not require identical additive terms. One quick hack to make this work would be to build the two EBMs using a superset of the features. You can use the "exclude" parameter of the __init_\ function to exclude the B terms from the A model that you build, and vice versa. You'll also need to exclude all the possible interaction terms from the features that you don't want to cross contaminate. There's another tricky aspect that when you merge EBMs where some of the terms are missing in the other model, it currently assumes the term values on the other EBM are essentially zero, which means averaging will decrease their contribution, whereas for this merge you want them to maintain their full strength. You can fix this issue by scaling the models prior to merging by a factor of 2.0, given they share the same 'y' in this example. (see: https://interpret.ml/docs/ExplainableBoostingClassifier.html#interpret.glassbox.ExplainableBoostingClassifier.scale) 3) We also expose a "measure_interactions" function that allows you to customize interaction detection. This might be useful if you want to customize the interaction detection to allow pairs across the A/B feature separation. You can then re-train your models while specifying the interactions explicitly. https://interpret.ml/docs/measure_interactions.html

paulbkoch commented 6 months ago

@fsaforo1, you might find this other thread regarding reweighing terms interesting https://github.com/interpretml/interpret/issues/460

hoangthienan95 commented 6 months ago

@fsaforo1 You might be interested in this package for multi-view/multi-modal data: https://mvlearn.github.io/ . Maybe you can use EBM as the model for each of the 3 views you mentioned, then train those 3 EBM models using mvlearn so they can be trained in a way that account for complementing views that hold differing statistical properties.