Closed dani-vu closed 6 days ago
Hey @dani-vu,
Thank you for the issue. I believe that if you use the cv="prefit"
you should be able to use MapieQuantileRegressor
by simply packaging your models as in the issue #340. Note that you need to fit all three models and provide them as follows:
estimators_: List[RegressorMixin]
- [0]: Estimator with quantile value of alpha/2
- [1]: Estimator with quantile value of 1 - alpha/2
- [2]: Estimator with quantile value of 0.5
Don't hesitate if you have any other question!
Hello,
We’re closing this issue due to inactivity, as we haven’t received a response in over a month. If you still need assistance or have more information to provide, please feel free to reopen the issue or create a new one.
Thank you!
Hi!
I reopen this issue as I am dealing with the same problem for a simple pre-trained Keras regression model.
I am not quite clear what those three estimators consist of and whether they would require retraining my model.
Please, could you kindly provide me with some guidelines on how to use MapieQuantileRegressor
with a pre-trained Keras model? I haven't found much more information anywhere.
This is an example script I've developed for the California Housing dataset:
import pandas as pd
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import Adam
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from mapie.regression import MapieQuantileRegressor
################## PREPARE DATA ##################
data = fetch_california_housing()
X, y = data.data, data.target
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test_cal, y_train, y_test_cal = train_test_split(X, y, test_size=0.3, random_state=42)
X_test, X_cal, y_test, y_cal = train_test_split(X_test_cal, y_test_cal, test_size=0.5, random_state=42)
print('Train: ', len(X_train))
print('Test: ', len(X_test))
print('Calibration: ', len(X_cal))
######################## TRAIN AND SAVE MODEL ########################
nn_model = Sequential([
Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
Dense(32, activation='relu'),
Dense(1)
])
nn_model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
nn_model.fit(X_train, y_train, epochs=20, batch_size=32,
validation_split=0.2, verbose=0)
nn_model.save('model.keras')
####################### LOAD AND WRAP MODEL ########################
class TrainedKerasRegressorWrapper(BaseEstimator, RegressorMixin):
def __init__(self, model):
self.model = model
def fit(self, X, y):
return self
def predict(self, X):
return self.model.predict(X).flatten()
def __sklearn_is_fitted__(self):
return True
loaded_model = load_model('model.keras')
model = TrainedKerasRegressorWrapper(loaded_model)
######################## QUANTILE REGRESSION #######################
model_list = [model_1, model_2, model_3] # <-- How can I get this models?
mapie_regressor = MapieQuantileRegressor(
estimator=model_list, cv='prefit')
mapie_regressor.fit(X_cal, y_cal)
predictions, intervals = mapie_regressor.predict(X_test)
lower_intervals = intervals[:, 0]
upper_intervals = intervals[:, 1]
results = pd.DataFrame({
'Prediction': predictions.flatten(),
'Lower Interval': lower_intervals.flatten(),
'Upper Interval': upper_intervals.flatten(),
'Amplitude': upper_intervals.flatten() - lower_intervals.flatten(),
'Actual Value': y_test
})
results.head()
Thank you! :-)
Hello @manjavacas.
Let's say you set alpha = 0.1. The MapieQuantileRegressor uses 3 models:
This way, you hope that y_true will fall 1-0.1 = 90% of the time (95%-5%) between the interval bounds.
To get those last 2 models, you need to fit them using the pinball loss
, a loss that takes a parameter tau
:
tau
= 0.1/2 = 0.05 = 5%tau
= (1-0.1/2) = 0.95 = 95%To understand how to create a pinball loss, you can check this link for example: https://stackoverflow.com/questions/43151694/define-pinball-loss-function-in-keras-with-tensorflow-backend
Let me know if you need more information.
Thank you very much for your reply @Valentin-Laurent!
I think I've managed to implement it successfully :-)
Now another question has come to me: is it advisable that the models used to predict the quantiles have the same architecture as those used to make the actual predictions? (i.e., let's suppose I can't pre-train my model but I can train a proper model for quantile estimation)
Thanks again!
PD. For anyone interested:
def pinball_loss(y_true, y_pred, tau=.5):
error = y_true - y_pred
return tf.reduce_mean(tf.maximum(tau * error, (tau - 1) * error))
def train_and_save_model(loss_fn, file_name):
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
Dense(32, activation='relu'),
Dense(1)
])
model.compile(optimizer='adam', loss=loss_fn)
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=.2, verbose=0)
model.save(file_name)
return model
alpha = .1
model_list = [
train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(1-alpha)/2), 'model_up.keras'),
train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(alpha/2)), 'model_low.keras'),
train_and_save_model('mse', 'model.keras')
]
model_files = ['model_low.keras', 'model_up.keras', 'model.keras']
wrapped_models = []
for file in model_files:
loaded_model = load_model(file, compile=False)
wrapped_model = TrainedKerasRegressorWrapper(loaded_model)
wrapped_models.append(wrapped_model)
mapie_regressor = MapieQuantileRegressor(
estimator=wrapped_models, cv='prefit')
# ... (MAPIE regressor predictions)
Hello @manjavacas, I'm glad you managed to implement it successfully :)
To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).
Let's ask @vincentblot28 or @thibaultcordier to confirm.
Hello @manjavacas, I'm glad you managed to implement it successfully :)
To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).
Let's ask @vincentblot28 or @thibaultcordier to confirm.
Yep, I suppose that is not a disadvantage, quite the opposite.
On the other hand, I understand that if my 'real' model fits the target well (average value, close to 0.5 quantile), the same architecture will work well for predicting other quantiles...
Thanks 👍🏻
Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).
The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.
In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).
Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture
Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).
The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.
In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).
Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture
Perfect, it's clear to me and now I understand the differences. Thanks!
All solved on my side ✅
I want to apply CQR with a customized LSTM model created with Tensorflow. However, it does not support Tensorflow models. Is there a workaround or am I missing something?
Thanks!