scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.4k stars 393 forks source link

Pandas FutureWarning: The default dtype for empty Series will be 'object' #358

Closed ftrojan closed 2 years ago

ftrojan commented 2 years ago

Expected Behavior

No warning when using the ordinal encoder with an empty dataframe.

Actual Behavior

category_encoders/ordinal.py:329: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

The same for lines 294 and 331 of the ordinal.py.

Steps to Reproduce the Problem

  1. Use the ordinal encoder with an empty dataframe.

Specifications

PaulWestenthanner commented 2 years ago

Hi @ftrojan

could you please specify your pandas and numpy version? I've been looking into our dependency versions the last couple of days co-incidentally and noticed that there is some incompatibilities with newer numpy versions (<1.20) with older pandas versions (<1.0.5). I've create a separate issue for this and explain the way I'll move this library to higher versions: https://github.com/scikit-learn-contrib/category_encoders/issues/359

PaulWestenthanner commented 2 years ago

I just checked and the warning is something different. But I think we can safely ignore it. In the ordinal encoder the data for all series is specified. So we shouldn't initialize an empty sequence, except for when the input data is empty, but then we do not care about data types

ftrojan commented 2 years ago

My pandas is 1.4.2 and numpy is 1.22.3

ftrojan commented 2 years ago

I am about to provide a pull request as the fix seems to be relatively straightforward. In about the next two weeks, as my free time allows.