scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

Lower bounds > Upper Bounds for Conformalized Quantile Regression #450

Closed francescomandruvs closed 1 month ago

francescomandruvs commented 1 month ago

Describe the bug I cannot share exacty the code since it is done on proprietary dataset, however, I noticed that for less than 0.01% of my test dataset. Mapie produced some lower bounds which are greater then my upper bound. I can share with you something which should be indeed a proof even without sharing the data:

mapie = MapieQuantileRegressor(model, **{"method": "quantile", "cv": "split", "alpha": 0.2})
mapie.fit(
    X_train, y_train,
    X_calib=X_calib, y_calib=y_calib,
    random_state=1337
)
y_pred, y_pis = mapie.predict(X_test)

np.any(y_pis[:, 0, :] > y_pis[:, 1, :])

> True

Is it something that can happen? During the fitting I got a Warning: UserWarning: WARNING: The predictions are ill-sorted.. Is this related with this "bug" ?

LacombeLouis commented 1 month ago

Hey @francescomandruvs, Thank you for this issue. Indeed, the warning is supposed to raise your attention to this aspect. When estimating a lower and an upper quantile by two separate quantile regressions, there is no guarantee that the lower estimate will actually be smaller than the upper estimate. This is known as the quantile crossing problem. - page 10 of the original CQR paper [2]. The reference for this issue is discussed in the paper by Bassett [1].

[1] Bassett Jr, Gilbert, and Roger Koenker. "An empirical quantile function for linear models with iid errors." Journal of the American Statistical Association 77.378 (1982): 407-415.

[2] Romano, Yaniv, Evan Patterson, and Emmanuel Candes. "Conformalized quantile regression." Advances in neural information processing systems 32 (2019).

francescomandruvs commented 1 month ago

Hi @francescomandruvs, I do not know if this is a MAPIE-specific problem, but I manually implement conformalized quantile regression and I also had the case where the lower bounds are greater than the upper bounds at certain timestamps. But I am also curious, how this is possbile...

What model are you inserting into the MapieQuantileRegressor?

I think that @LacombeLouis answered to us quite well :)