Closed glevv closed 1 year ago
Thank you @GLevV for this, some comments:
StandardScaler
, what you propose if MinMaxScaler
). Maybe you could include a scale_X
which can be True
or False
among the parameters and by default this would use the StandardScaler
which is the most common scaler ? THe best preprocessing really depends on the dataset so it should not be fixed in the algorithm.Otherwise LGTM, thanks.
They are completely different kernels and methods of computing them. Stump kernel was presented in Support Vector Machinery for Infinite Ensemble Learning, but it could be hard to compute exactly, so in the paper Uniform Approximation of Functions with Random Bases MC approximation was proposed (the same paper where MC approximation of RBF kernel - RBFSampler - is described). I think that StumpKernelSampler/StumpSampler should be shorter and more consistent name (similar to RBFSampler).
As for the scaling, I think it is possible to remove it altogether and let users build their own pipelines (obviously stating in the docs that this method requires scaling). It will be consistent with other kernel methods/approximations (RBFSampler also requires scaling to give proper approximation) and original formulation in the paper.
Closed due to inactivity
MC approximation of AdaBoost stump kernel #119