FLAML model API not fully compatible with sklearn API

MichaelMarien commented 2 years ago

First of all - thanks a lot for the great library.

I used FLAML successfully to find a suitable model for my classification use case. As I need to have calibrated probabilities I decided to clone the best found estimator and introduce it into sklearn CalibratedClassifierCV:

automl = AutoML()
automl.fit(X,y, task='classification')
model_to_calibrate = CalibratedClassifierCV(sklearn.base.clone(automl.model))
model_to_calibrate.fit(X,y)

unfortunately I run into a TypeError: predict_proba() got an unexpected keyword argument 'X'. I traced it down to the following line (601) in sklearn where X is explicitly mentioned as keyword argument https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/calibration.py#L601 which clashes with the signature in line https://github.com/microsoft/FLAML/blob/main/flaml/model.py#L221 in FLAML (using X_test instead of X).

Wondering if there is any reason. Otherwise, I'd be happy to make a PR changing the signature from X_test to X to align fully with sklearn.

sonichi commented 2 years ago

@MichaelMarien Please go ahead and make the PR. Thanks!

BTW, in your use case, you may want to apply automl._transformer.transform() to your training and test data before model_to_calibrate.fit() and predict_proba() in general. If no transformation is actually applied, then you don't need it.

MichaelMarien commented 2 years ago

Thanks for the swift reply, I'll give the PR a go!

sonichi commented 2 years ago

Thanks for the swift reply, I'll give the PR a go!

Thank you! BTW, what application do you use flaml for?

MichaelMarien commented 2 years ago

Optimising marketing actions - I'm looking at it from a causal point of view (trying to integrate it also with econML) and I need to train a lot of models for the different actions (treatment) as well as retrain them quite often. Moreover, have some constraints on how complex or slow during inference the models can be. FLAML addressed part of the problem nicely :)

sonichi commented 2 years ago

Optimising marketing actions - I'm looking at it from a causal point of view (trying to integrate it also with econML) and I need to train a lot of models for the different actions (treatment) as well as retrain them quite often. Moreover, have some constraints on how complex or slow during inference the models can be. FLAML addressed part of the problem nicely :)

Thanks for sharing the feedback and for your contribution. Feel free to join gitter to chat.

microsoft / FLAML

FLAML model API not fully compatible with sklearn API #400