paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).
https://paris-saclay-cds.github.io/ramp-docs/
BSD 3-Clause "New" or "Revised" License
68 stars 43 forks source link

Support for pandas dataframes in workflows #304

Open h2o64 opened 2 years ago

h2o64 commented 2 years ago

Currently, the Classifier and Regressor workflows don't accept pandas dataframes as inputs.

Indeed in train_submission the arrays are being indexed with slices which selects columns instead of rows leading to the following error message

KeyError: "None of [Int64Index([ 256,  127,  753,  439,  825, 1167,  786, 1689, 1615,  675,\n            ...\n             721, 1064,  696, 1122,  632, 1103,  406, 1029, 1750,  975],\n           dtype='int64', length=1451)] are in the [columns]"
albertcthomas commented 2 years ago

Usually a pandas data frame is expected as input of the feature extractor but a numpy array is expected as input of the classifier/regressor.