Closed kerrickstaley closed 3 years ago
@janmotl @wdm0006 this is ready for review, could you take a look?
Looking back at old PRs, this looks good to me but looks like the test suite failed. The logs arent available anymore, could you pull in any recent changes from master and push again to re-run? Thanks
Looks like the basen encoding tests are passing fine but theres an issue in master for the GLM encoding unrelated to this PR. Going to go ahead and merge this, thanks for the patience.
BaseNEncoder encoder used an incorrect formula for calculating the number of required bits in the output. If there are
nvals
distinct values and we reserve one encoding to represent "missing or unknown", then the correct number of bits isceil(log(nvals + 1, base))
. However, the code was previously using the formulaceil(log(nvals, base)) + 1
.Fixes https://github.com/scikit-learn-contrib/category_encoders/issues/264
Proposed Changes
ceil(log(nvals + 1, base))
.