scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.3k stars 111 forks source link

MapieQuantileRegressor with prefit model from Keras/Tensorflow #448

Closed dani-vu closed 6 days ago

dani-vu commented 6 months ago

I want to apply CQR with a customized LSTM model created with Tensorflow. However, it does not support Tensorflow models. Is there a workaround or am I missing something?

Thanks!

LacombeLouis commented 6 months ago

Hey @dani-vu, Thank you for the issue. I believe that if you use the cv="prefit" you should be able to use MapieQuantileRegressor by simply packaging your models as in the issue #340. Note that you need to fit all three models and provide them as follows:

    estimators_: List[RegressorMixin]
        - [0]: Estimator with quantile value of alpha/2
        - [1]: Estimator with quantile value of 1 - alpha/2
        - [2]: Estimator with quantile value of 0.5

Don't hesitate if you have any other question!

jawadhussein462 commented 2 weeks ago

Hello,

We’re closing this issue due to inactivity, as we haven’t received a response in over a month. If you still need assistance or have more information to provide, please feel free to reopen the issue or create a new one.

Thank you!

manjavacas commented 1 week ago

Hi!

I reopen this issue as I am dealing with the same problem for a simple pre-trained Keras regression model.

I am not quite clear what those three estimators consist of and whether they would require retraining my model.

Please, could you kindly provide me with some guidelines on how to use MapieQuantileRegressor with a pre-trained Keras model? I haven't found much more information anywhere.

This is an example script I've developed for the California Housing dataset:

import pandas as pd

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import Adam

from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

from mapie.regression import MapieQuantileRegressor

################## PREPARE DATA ##################

data = fetch_california_housing()
X, y = data.data, data.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test_cal, y_train, y_test_cal = train_test_split(X, y, test_size=0.3, random_state=42)
X_test, X_cal, y_test, y_cal = train_test_split(X_test_cal, y_test_cal, test_size=0.5, random_state=42)

print('Train: ', len(X_train))
print('Test: ', len(X_test))
print('Calibration: ', len(X_cal))

######################## TRAIN AND SAVE MODEL ########################

nn_model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1)
])

nn_model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')

nn_model.fit(X_train, y_train, epochs=20, batch_size=32,
             validation_split=0.2, verbose=0)

nn_model.save('model.keras')

####################### LOAD AND WRAP MODEL ########################

class TrainedKerasRegressorWrapper(BaseEstimator, RegressorMixin):
    def __init__(self, model):
        self.model = model

    def fit(self, X, y):
        return self

    def predict(self, X):
        return self.model.predict(X).flatten()

    def __sklearn_is_fitted__(self):
        return True

loaded_model = load_model('model.keras')

model = TrainedKerasRegressorWrapper(loaded_model)

######################## QUANTILE REGRESSION #######################

model_list = [model_1, model_2, model_3]  # <-- How can I get this models?

mapie_regressor = MapieQuantileRegressor(
    estimator=model_list, cv='prefit')

mapie_regressor.fit(X_cal, y_cal)

predictions, intervals = mapie_regressor.predict(X_test)

lower_intervals = intervals[:, 0]
upper_intervals = intervals[:, 1]

results = pd.DataFrame({
    'Prediction': predictions.flatten(),
    'Lower Interval': lower_intervals.flatten(),
    'Upper Interval': upper_intervals.flatten(),
    'Amplitude': upper_intervals.flatten() - lower_intervals.flatten(),
    'Actual Value': y_test
})

results.head()

Thank you! :-)

Valentin-Laurent commented 1 week ago

Hello @manjavacas.

Let's say you set alpha = 0.1. The MapieQuantileRegressor uses 3 models:

This way, you hope that y_true will fall 1-0.1 = 90% of the time (95%-5%) between the interval bounds.

To get those last 2 models, you need to fit them using the pinball loss, a loss that takes a parameter tau:

To understand how to create a pinball loss, you can check this link for example: https://stackoverflow.com/questions/43151694/define-pinball-loss-function-in-keras-with-tensorflow-backend

Let me know if you need more information.

manjavacas commented 6 days ago

Thank you very much for your reply @Valentin-Laurent!

I think I've managed to implement it successfully :-)

predictions

Now another question has come to me: is it advisable that the models used to predict the quantiles have the same architecture as those used to make the actual predictions? (i.e., let's suppose I can't pre-train my model but I can train a proper model for quantile estimation)

Thanks again!

PD. For anyone interested:

def pinball_loss(y_true, y_pred, tau=.5):
    error = y_true - y_pred
    return tf.reduce_mean(tf.maximum(tau * error, (tau - 1) * error))

def train_and_save_model(loss_fn, file_name):
    model = Sequential([
        Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
        Dense(32, activation='relu'),
        Dense(1)
    ])
    model.compile(optimizer='adam', loss=loss_fn)
    model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=.2, verbose=0)
    model.save(file_name)
    return model

alpha = .1

model_list = [
     train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(1-alpha)/2), 'model_up.keras'),
     train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(alpha/2)), 'model_low.keras'),  
     train_and_save_model('mse', 'model.keras')
]

model_files = ['model_low.keras', 'model_up.keras', 'model.keras']
wrapped_models = []

for file in model_files:
    loaded_model = load_model(file, compile=False)
    wrapped_model = TrainedKerasRegressorWrapper(loaded_model)
    wrapped_models.append(wrapped_model)

mapie_regressor = MapieQuantileRegressor(
    estimator=wrapped_models, cv='prefit')

# ... (MAPIE regressor predictions)
Valentin-Laurent commented 6 days ago

Hello @manjavacas, I'm glad you managed to implement it successfully :)

To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).

Let's ask @vincentblot28 or @thibaultcordier to confirm.

manjavacas commented 6 days ago

Hello @manjavacas, I'm glad you managed to implement it successfully :)

To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).

Let's ask @vincentblot28 or @thibaultcordier to confirm.

Yep, I suppose that is not a disadvantage, quite the opposite.

On the other hand, I understand that if my 'real' model fits the target well (average value, close to 0.5 quantile), the same architecture will work well for predicting other quantiles...

Thanks 👍🏻

vincentblot28 commented 6 days ago

Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).

The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.

In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).

Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture

manjavacas commented 6 days ago

Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).

The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.

In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).

Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture

Perfect, it's clear to me and now I understand the differences. Thanks!

All solved on my side ✅