scikit-learn-contrib / sklearn-pandas

Pandas integration with sklearn
Other
2.81k stars 412 forks source link

DataFrameMapper with df_out should preserve categorical data #81

Open mratsim opened 7 years ago

mratsim commented 7 years ago

DataFrameMapper transformation removes the "category" dtype from dataframe columns

Categorical status can be checked with: hasattr(df['categorical_column'], 'cat')

Some classifiers like LightGBM can auto detect categorical data from dataframes and handle them very efficiently without OneHotEncoding.

MarcusJones commented 5 years ago

Just ran into this one, I wrote a custom transformer specifically to convert columns to 'category' dtype. Had no idea why it wasn't working until I found this issue. Would this be difficult to implement?