skrub-data / skrub

Prepping tables for machine learning
https://skrub-data.org/
BSD 3-Clause "New" or "Revised" License
1.16k stars 97 forks source link

FIX handle polars transformer output in scikit-learn < 1.4 #941

Closed jeromedockes closed 3 months ago

jeromedockes commented 3 months ago

I think this may be a better option than #940

when the input is a polars dataframe and the transformer given to OnEachColumn or OnSubFrame is a scikit-learn transformer which exposes the set_output API, we would like to call set_output(transform='polars') so that it returns us a polars dataframe. However, in scikit-learn < 1.3, the 'polars' option is not available, only 'pandas'. so we have a few options:

it does not complicate the code much, and avoids complicating the combinations of versions that we allow. it may also result in better error messages for custom estimators that don't support the set_output api.

In addition to doing the logic around scikit-learn 1.4 and pandas-> polars conversion this PR improves the checks of the transformer's output type