lemma-osu / sknnr

scikit-learn compatible estimators for various kNN imputation methods
https://sknnr.readthedocs.io
0 stars 1 forks source link

Make estimators compatible with `pandas` dataframes? #15

Closed aazuspan closed 1 year ago

aazuspan commented 1 year ago

While trying to implement #2, I discovered another snag: Euclidean and Mahalanobis cannot be fit by passing a dataframe as X. The errors seem to originate in the transformers, but I haven't dug any deeper than that.

from sklearn.datasets import load_linnerud
from sklearn_knn import Euclidean

X, y = load_linnerud(return_X_y=True)
X_df = pd.DataFrame(X)

# Fits fine with array
Euclidean().fit(X, y)

# Fails with dataframe [ValueError: Length of values (20) does not match length of index (3)]
Euclidean().fit(X_df, y)

While sklearn definitely doesn't claim to be interoperable with pandas, my past experience and a quick test (below) indicates that most if not all regressors do support dataframes transparently.

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVR

X, y = load_iris(return_X_y=True)
X_df = pd.DataFrame(X)
# All the regressors below support dataframes
regressors = [
    RandomForestRegressor(),
    GradientBoostingRegressor(),
    KNeighborsRegressor(),
    LinearRegression(),
    MLPRegressor(),
    SVR(),
]

for reg in regressors:
    reg.fit(X_df, y)

My inclination is that we should too, but let me know what you think.

grovduck commented 1 year ago

Yes, I absolutely agree - that should be expected.

Looks like the bug is coming from something I've done in MyStandardScaler. StandardScaler works fine with X_df.

grovduck commented 1 year ago

I had expected this issue to close once #19 was merged, but apparently that only happens with the default branch (currently main) - see here. It looks like one can write a workflow, but at present that's beyond my Github skills.

@aazuspan, if you want to tackle this, I can leave this open. Otherwise, I'm happy to close it.

aazuspan commented 1 year ago

Aha, I was wondering why these weren't closing automatically! Thanks for the link.

I think manually closing is probably easy enough for now, so I'll go ahead and close this.