scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.4k stars 393 forks source link

Pandas copy-on-write doesn't work properly #422

Closed s-banach closed 1 year ago

s-banach commented 1 year ago

Here is a basic error message you will get when running (probably) any of the encoders after setting pd.options.mode.copy_on_write = True.

ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
When using the Copy-on-Write mode, such inplace method never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' instead, to perform the operation inplace on the original object.

A quick way to fix this automatically would be to run ruff with the PD rules enabled, in particular PD002 which removes all uses of inplace.

s-banach commented 1 year ago

Copy on write is probably going to be the default in pandas 3.0, so this should be viewed as a legitimate issue.

Fixing this is typically as simple as replacing df[col].method(inplace=True) with df[col] = df[col].method().

PaulWestenthanner commented 1 year ago

thanks for reporting this. I can fix it once I have some time. Or maybe you can create a PR if you want to