scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.4k stars 393 forks source link

Fix basen_to_integer when column name contains regex metachar #393

Closed pimlock closed 1 year ago

pimlock commented 1 year ago

Fixes https://github.com/scikit-learn-contrib/category_encoders/issues/392

Proposed Changes

Adds re.escape() when matching columns from the dataset, when doing inverse_transform in BaseNEncoder.

Without this re.escape(), any regex metacharacter is being interpreted by the regex engine, which leads to invalid results or exception being thrown.

PaulWestenthanner commented 1 year ago

nicely spotted, thanks for this contribution!