BaseNEncoder.inverse_transform() should work correctly with column names containing regex metacharacters, for example for column names such as: my_column (test), test [123], the characters ()[] will be interpreted as regex's capturing group and character range, but instead should be treated as literals.
Trying to inverse_transform(), when the input column contained regex metacharacter (e.g. ()) raises exception:
Traceback (most recent call last):
File "site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-92-c30af6a1928b>", line 10, in <cell line: 10>
inversed = enc.inverse_transform(transformed)
File "site-packages/category_encoders/basen.py", line 268, in inverse_transform
X = self.basen_to_integer(X, self.cols, self.base)
File "site-packages/category_encoders/basen.py", line 358, in basen_to_integer
insert_at = out_cols.index(col_list[0])
IndexError: list index out of range
Steps to Reproduce the Problem
from category_encoders import BaseNEncoder
import pandas as pd
col_name = "A (test)"
X = pd.DataFrame(data={col_name: ["A", "B", "A", "C"]})
enc = BaseNEncoder(cols=[col_name]).fit(X)
transformed = enc.transform(X)
# fails with `index 0 is out of bounds`
inversed = enc.inverse_transform(transformed)
Expected Behavior
BaseNEncoder.inverse_transform()
should work correctly with column names containing regex metacharacters, for example for column names such as:my_column (test)
,test [123]
, the characters()[]
will be interpreted as regex's capturing group and character range, but instead should be treated as literals.See: https://github.com/scikit-learn-contrib/category_encoders/blob/1def42827df4a9404553f41255878c45d754b1a0/category_encoders/basen.py#L269
Actual Behavior
Trying to
inverse_transform()
, when the input column contained regex metacharacter (e.g.()
) raises exception:Steps to Reproduce the Problem
Specifications