[Question; need help; support request] Possible to join multiple CountEncoders after parallel (multiprocessing) fitting?

scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders

BSD 3-Clause "New" or "Revised" License

2.39k stars 393 forks source link

this is not supported out of the box. Are you planning to use the countencoder with normalize=True? Would it be possible to fit on a random subset only? I'd expect the results to be similar to the whole dataset. If you want to go for the full data set you need to implement something yourself. If you fit multiple CountEncoders make sure they all use the same OrdinalEncoder (the count encoder first fits an OrdinalEncoder to encode e.g. "foo", "bar" to 1, 2 and hence standardize the input. You'd want to pass that fitted OrdinalEncoder in the init rather than fit it in the fit function. Writing a combine function that adds up the counts should be rather straight forward then.

scikit-learn-contrib / category_encoders

[Question; need help; support request] Possible to join multiple CountEncoders after parallel (multiprocessing) fitting? #440