stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

feat: Add estimator type with reg and clf for NGBClassifier and NGBRegressor #325

Open NeroHin opened 1 year ago

NeroHin commented 1 year ago

Fixed #324

NeroHin commented 1 year ago

324

NeroHin commented 1 year ago

@tonyduan @alejandroschuler

alejandroschuler commented 1 year ago

hey @NeroHin, thanks for the suggestion.

Does it make sense to use ngboost in an ensemble for classification or regression alongside eg xgboost? The two are fundamentally different in the case of regresion: ngboost is for probabilistic prediction- it outputs an entire predictive distribution. There's not much point in using ngboost for point prediction if that's all you need since it's only every about as good as xgboost but much slower (see paper for performance comparison). In the case of standard classification (without some parametric discrete distribution eg poisson) ngboost is basically the same as xgboost except, again, slower.

So while there is nothing fundamentally "wrong" with this change, I feel like it is enabling a use-case that doesn't really make much sense or should not be encouraged. Does that make sense or am I missing something?

NeroHin commented 1 year ago

hey @NeroHin, thanks for the suggestion.

Does it make sense to use ngboost in an ensemble for classification or regression alongside eg xgboost? The two are fundamentally different in the case of regresion: ngboost is for probabilistic prediction- it outputs an entire predictive distribution. There's not much point in using ngboost for point prediction if that's all you need since it's only every about as good as xgboost but much slower (see paper for performance comparison). In the case of standard classification (without some parametric discrete distribution eg poisson) ngboost is basically the same as xgboost except, again, slower.

So while there is nothing fundamentally "wrong" with this change, I feel like it is enabling a use-case that doesn't really make much sense or should not be encouraged. Does that make sense or am I missing something?

@alejandroschuler Thanks for your reply. I've seen some new research to use ngboost has a better performance than xgboost (case1 case2).

In my case, I have to use the ensemble method by sklearn.ensemble.VotingClassifier() can correction with the imbalance data for false positive rate and enhance the true positive rate.

But when I add the NGBoost classifier into sklearn.ensemble.VotingClassifier(), it'll return the error with issue #324, and I reference (ref1, ref2), confirm the error is NGBoost didn't have the self._estimator_type into initialize function, so I create this pull request and hope to fix it to help those using ngboost into the ensemble method of sklearn.ensemble.

alejandroschuler commented 1 year ago

I don't find those papers terribly convincing and in our own paper we found that ngboost is generally only as good as xgboost or a little worse. That makes sense: ngboost isn't trying to directly compete with xgboost.

I still think it's probably a waste of time / energy to try to use ngboost in an ensemble if your goal is point prediction or classification, but I'm going to approve the PR here b/c it doesn't hurt to have the functionality and hopefully people will see this discussion and be dissuaded from using it :)

NeroHin commented 1 year ago

I don't find those papers terribly convincing and in our own paper we found that ngboost is generally only as good as xgboost or a little worse. That makes sense: ngboost isn't trying to directly compete with xgboost.

I still think it's probably a waste of time / energy to try to use ngboost in an ensemble if your goal is point prediction or classification, but I'm going to approve the PR here b/c it doesn't hurt to have the functionality and hopefully people will see this discussion and be dissuaded from using it :)

Thanks for your comment !