monotone constraints for multiclass objective

gsimko-lyft commented 3 years ago

It's undocumented whether monotonicity constraints are supported for the multiclass objective but based on some small tests it looks like they only support writing constraints for the first class.

If that's true, would it be possible to add separate constraints for each class?

jameslamb commented 3 years ago

@CharlesAuguste could you comment on this whenever you have time?

CharlesAuguste commented 3 years ago

@jameslamb I agree with @gsimko-lyft that monotone constraints don't work for this objective. I think we should disable monotonicity constraints for this specific objective, unless you have a better idea.

The main issue here is what we mean by monotonicity. I think in this case the end users may be expecting monotonicity of the argmax of the output probabilities (=the predicted class) with respect to the input. However, in the implementation, monotonicity means that the output of each tree is monotone vs. the input. With most objectives we end up summing trees, which is fine as it preserves monotonicity. But with this classification objective, we are computing scores for all inputs, for all classes, and we are performing a softmax (to compute the probabilities of the class) on the scores for each input. These scores are monotone, because they are the output of the trees. But this doesn't seem right for 2 reasons,

Having the scores monotone doesn't really make sense, I don't think this is something that will help the algorithm in any way,
The softmax function, doesn't work well with the monotonicity of the different inputs so I think it won't give any monotonicity guarantee on the result.

I don't see a way to fix it, because even if we came up with constraints that made sense for the scores in the trees, the softmax will break the monotonicity. So I think that we should disable the monotone constraints for this objective.

@gsimko-lyft The monotonicity constraints were implemented with regression objective in mind and not really classification objectives, because in many cases when you are doing classification, you can't really compare the classes. If you are able to compare the classes together, then you may be able to get away with a regression objective, and use the monotone constraints there.

gsimko-lyft commented 3 years ago

@CharlesAuguste Softmax is a monotonic function so performing a softmax on top of all the other monotonicity-preserving steps should still maintain monotonicity. Am I missing something?

Wrt. classification in my use case I'm more interested in the predicted probabilities than the (binary) classification. Since I need probabilistic interpretation I could work with regression and apply a sigmoid/softmax on the output but effectively that's the same as working with the binary classifier objective. Did you have something else on mind?

CharlesAuguste commented 3 years ago

@gsimko-lyft Here is how I think the multiclass objective function works. It is not one I know very well, so let me know if I am wrong somewhere. Say we want to predict classes in $[1,n]$ , versus a 1D variable x (to make things simple). Then for each step we will create n trees for the scores $~t_1(x),...,t_n(x)$ . The only thing we know how to do right now is making these trees monotone vs. x. With the current implementation, I think all the trees will have the same monotonicity vs. x (though I don't think it should be too hard to change that if we wanted to).

Now with these trees, we can compute the probabilities of belonging to a class using the softmax function, $\forall i \in[1,n],p_i(x)=\frac{e^{t_i(x)}}{\sum_{j=1}^n e^{t_j(x)}}$ . When x increases, $~t_i(x)$ increases/decreases (depending on the monotone constraint), but all other $~t_j(x)$ will also increase / decrease. So I don't think this gives any particular monotonic property to $~p_i(x)$ . I don't think this gives any particular property to the max / argmax or the $~p_i(x),i\in[1,n]$ either. Does that make sense?

About your use case, if you have only 2 classes, then the binary objective should work with monotonic constraints. If you have more classes, and you want a simple constraint like having the probability of the first class monotone increasing vs. x (=making $~p_1(x)$ monotone increasing vs. x), then I think in theory that could be achieved by having $~t_1(x)$ monotone increasing, and the rest $~t_2(x),...,t_n(x)$ decreasing. That's currently not possible in the code, but should be easy enough to change I think (and you can do something similar on your end by doing some binary one-vs-all classification with the right constraints and softmax on the outputs). If you want more complicated constraints (like constraints for the probabilities of each of the different classes vs. the same variable, all at the same time), then I don't really see a simple solution.

gsimko-lyft commented 3 years ago

Thanks for the detailed explanation @CharlesAuguste, that made a lot of sense! I didn't realize earlier that internally LightGBM (and I think xgboost as well) builds n trees for n classes. Agreed with you that with this setup one needs to be careful. Building multiple one-vs-all is an excellent suggestion, if I understand correctly it would perform computationally similarly and we could have the pairwise constraints that's needed in my use case.

I'm wondering, is there a compelling reason for building n trees instead of building a single tree with multiple outputs? Maybe this is just a less explored setup? FWIW looks like there's a recent paper that asked the same question: GBDT-MO: Gradient Boosted Decision Trees for Multiple Outputs, https://arxiv.org/abs/1909.04373

CharlesAuguste commented 3 years ago

Sorry for not answering earlier. I hope you are able to make progress on the problem you are trying to solve. I am not sure if there is a compelling reason for building n trees. My guess would be that it is easier to approach and more intuitive. Maybe @guolinke can tell us more about it? I read the paper you linked and it is very interesting. Their code is here https://github.com/zzd1992/GBDTMO

StrikerRUS commented 3 years ago

@CharlesAuguste Should we update any code and/or docs reflecting the fact that monotone constraints don't work with multiclassification task? Also, should this be a feature request?

StrikerRUS commented 3 years ago

Gently ping @CharlesAuguste

StrikerRUS commented 3 years ago

Kindly ping @CharlesAuguste for this maintenance question https://github.com/microsoft/LightGBM/issues/3912#issuecomment-792356725.

CharlesAuguste commented 3 years ago

@StrikerRUS Sorry I didn't answer earlier I had notification issues earlier this year. I indeed think we should update the code to throw an error when monotone constraints and multiclass objective are enabled at the same time. Technically, the constraints are doing their thing correctly, but this gives no guarantee on the output as far as I understand it, so I don't think this would be of any interest for the end user. It is also pretty confusing for the end user why it is not working when they don't have some knowledge of the implementation.

I agree this GBDT-MO could be a new feature request!

StrikerRUS commented 3 years ago

@CharlesAuguste Thank you very much for your detailed answer!

microsoft / LightGBM

monotone constraints for multiclass objective #3912