[FEA] Support `feature_names_in_` attribute

tvdboom commented 11 months ago

Is your feature request related to a problem? Please describe. To be able to use cuml estimators as a sklearn drop-in replacement, they should have the same attributes. One often used in pipelines is feature_names_in_, that contains the names of the features seen during fit (when provided in a pd.dataframe or cupy.dataframe)

Describe the solution you'd like Support for all cuml estimators to have the feature_names_in_ attribute after fit. Currently, only n_features_in_ is supported.

from sklearn.datasets import load_breast_cancer
from cuml.preprocessing import StandardScaler

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().fit(X)
print(scaler.n_features_in_)  # Works
print(scaler.feature_names_in_)  # AttributeError

Implementing this could potentially help with #5564

jinsolp commented 5 months ago

Hello! Would you like to move forward with this issue? Or will it be okay if I start working on the feature?

tvdboom commented 5 months ago

Feel free to work on it!

rapidsai / cuml

[FEA] Support `feature_names_in_` attribute #5677