Open Piecer-plc opened 2 years ago
Happens because WOE relies on Ordinal encoding and OE copies input data https://github.com/scikit-learn-contrib/category_encoders/blob/6a13c14919d56fed8177a173d4b3b82c5ea2fef5/category_encoders/ordinal.py#L186
(When) do we actually need to copy inputs?
Memory increase of WOEEncoder for category_encoders version >=2.0.0
Hi, I noticed another memory issue with
WOEEncoder
. I have submitted the same bug before in #335, the difference between two bugs is the different encoder methods used and different datasets. In order to distinguish between the two encoder APIs, I resubmitted a new bug report.Expected Behavior
Similar memory usage
Actual Behavior
According to the experiment results, when the category_encoders version is higher than 2.0.0,
weight_enc.fit(train[weight_encode], train['target'])
memory usage increase from 58MB to 206MB.Steps to Reproduce the Problem
Step 1: Download the dataset
train.zip
Step 2: install category_encoders
Step 3: change category_encoders version and save the memory usage
Specifications
Version: 2.3.0, 2.2.2, 2.1.0, 2.0.0, 1.3.0 Platform: ubuntu 16.4 OS : Ubuntu CPU : Intel(R) Core(TM) i9-9900K CPU GPU : TITAN V