Design - Improve MAPIE and mlFlow interaction

simon-hirsch commented 5 months ago

Hi! This is a bit of a general question / suggestion. I have trouble working with MAPIE and mlflow for experiment / model tracking. That is a bit of a pity, because it limits the usability of an otherwise nice library.

Is your feature request related to a problem? Please describe.

The model.predict() output of Tuple[Array, Tuple[Array, Array]] is not super self-explanatory and a bit cumbersome when it comes to further downstream processing, especially with mlflow experiment tracking / deployment.

Suggestion / possible solution (but very open for discussion)

A relatively straight-forwad solution would be to have the model output as Dict({"mean": Array, "lower": Array, "upper": Array}). That way it is clear what is what and this is ought to be accepted by the mlflow infer_signature(). (I've monkey patched my estimator to check this). To avoid breaking changes, one could add an output_format parameter in the estimator class.

Did somebody find other ways to work well with MAPIE and mlflow apart from monkey patching? Appreciate any input :)

Cheers, Simon

LacombeLouis commented 5 months ago

Hey @simon-hirsch, thank you for this issue and it seems like your monkey patch fixes this issue for the moment! This is not something we had taken into account. We do have a very specific structure for the output of conformal predictions. Also note that for some models, you can provide multiple alphas in the model.predict(). Meaning that:

print(mapie_regressor.predict(X_test, alpha=0.2)[0].shape)
print(mapie_regressor.predict(X_test, alpha=0.2)[1].shape)

# output
(250,)
(250, 2, 1)

and

print(mapie_regressor.predict(X_test, alpha=[0.2, 0.3])[0].shape)
print(mapie_regressor.predict(X_test, alpha=[0.2, 0.3])[1].shape)

# output
(250,)
(250, 2, 2)

This is a comment we will take into account for future changes, so thank you!

jawadhussein462 commented 1 week ago

Hello,

This issue will be addressed with the release of MAPIE v1.

The output shape of model.predict(), currently structured as Tuple[Array, Tuple[Array, Array]], will be divided into two distinct methods:

model.predict() for point predictions, with output shape (n_samples,)
model.predict_set() for interval predictions, with output shape (n_samples, 2)

simon-hirsch commented 1 week ago

Cool, looking forward. Do you also plan to support multiple sets at once, i.e. something along the lines of: estimator.predict_sets(X, widths=[0.5, 0.75, 0.9]) with output shape (n, 6)?

scikit-learn-contrib / MAPIE

Design - Improve MAPIE and mlFlow interaction #454