Open ageron opened 5 years ago
I think there's a problem in your "Steps/Code to Reproduce" section, fourth code block, you are printing the log_reg predictions a second time instead of the SVC results. After correcting it I got:
SVC score: 1.0
SVC predict: ['a' 'b' 'b' 'a']
which is the same as what the voting classifier's SVC is predicting. Does this explain the issue?
If not, I did find this, which could potentially be what's happening: https://github.com/scikit-learn/scikit-learn/issues/11263
Hi @samwaterbury , good catch, thanks, I fixed the code above. However, the issue remains: the subestimators expect labels as integers, not strings. It does not seem related to #11263.
This is done to facilitate alignment in predict_proba
etc, not for efficiency (though there are conceivable benefits there).
I don't see it as a big problem, though:
Changing it would be hard to do without breaking backwards compatibility.
It's worth pointing out that VotingClassifier
gives access to its internal label encoder via the attribute le_
, however this is not documented.
Thanks @jnothman and @samwaterbury. I agree, it's not a big issue, probably just a sentence or two to add in the documentation, including the le_
tip. I won't be available in the next ~3 weeks, but I can take care of this if it's not done by then.
Description
The
VotingClassifier
transforms the labels before training the sub-estimators, so if you try to use them directly for predictions or scoring, you get unexpected results. IMHO, this should either be fixed (but I'm guessing it's a performance optimization) or at least the documentation should warn about this fact.Steps/Code to Reproduce
Expected Results
I would expect the sub-estimators to produce the same score and predictions as an equivalent classifier trained outside of the
VotingClassifier
.Actual Results
Versions