NeroHin commented 1 year ago

Fixed #324

Problem: Currently, when using NGBClassifier or NGBRegressor with the sklearn ensemble voting classifier or regressor, a ValueError is returned with the message "The estimator NGBClassifier / NGBRegressor should be a classifier / regressor."
Solution: Added two attributes, self._estimator_type = "classifier" and self._estimator_type = "regressor", into the NGBClassifier and NGBRegressor classes respectively. These attributes indicate whether the model is a classifier or a regressor, allowing the models to be correctly identified by the sklearn ensemble voting classifier or regressor.
Impact: This change should resolve the ValueError issue when using NGBClassifier or NGBRegressor with the sklearn ensemble voting classifier or regressor, improving the usability of the models in an ensemble learning setting.

NeroHin commented 1 year ago

324

NeroHin commented 1 year ago

@tonyduan @alejandroschuler

alejandroschuler commented 1 year ago

hey @NeroHin, thanks for the suggestion.

Does it make sense to use ngboost in an ensemble for classification or regression alongside eg xgboost? The two are fundamentally different in the case of regresion: ngboost is for probabilistic prediction- it outputs an entire predictive distribution. There's not much point in using ngboost for point prediction if that's all you need since it's only every about as good as xgboost but much slower (see paper for performance comparison). In the case of standard classification (without some parametric discrete distribution eg poisson) ngboost is basically the same as xgboost except, again, slower.

So while there is nothing fundamentally "wrong" with this change, I feel like it is enabling a use-case that doesn't really make much sense or should not be encouraged. Does that make sense or am I missing something?

NeroHin commented 1 year ago

hey @NeroHin, thanks for the suggestion.

Does it make sense to use ngboost in an ensemble for classification or regression alongside eg xgboost? The two are fundamentally different in the case of regresion: ngboost is for probabilistic prediction- it outputs an entire predictive distribution. There's not much point in using ngboost for point prediction if that's all you need since it's only every about as good as xgboost but much slower (see paper for performance comparison). In the case of standard classification (without some parametric discrete distribution eg poisson) ngboost is basically the same as xgboost except, again, slower.

So while there is nothing fundamentally "wrong" with this change, I feel like it is enabling a use-case that doesn't really make much sense or should not be encouraged. Does that make sense or am I missing something?

@alejandroschuler Thanks for your reply. I've seen some new research to use ngboost has a better performance than xgboost (case1 case2).

In my case, I have to use the ensemble method by sklearn.ensemble.VotingClassifier() can correction with the imbalance data for false positive rate and enhance the true positive rate.

But when I add the NGBoost classifier into sklearn.ensemble.VotingClassifier(), it'll return the error with issue #324, and I reference (ref1, ref2), confirm the error is NGBoost didn't have the self._estimator_type into initialize function, so I create this pull request and hope to fix it to help those using ngboost into the ensemble method of sklearn.ensemble.

alejandroschuler commented 1 year ago

I don't find those papers terribly convincing and in our own paper we found that ngboost is generally only as good as xgboost or a little worse. That makes sense: ngboost isn't trying to directly compete with xgboost.

I still think it's probably a waste of time / energy to try to use ngboost in an ensemble if your goal is point prediction or classification, but I'm going to approve the PR here b/c it doesn't hurt to have the functionality and hopefully people will see this discussion and be dissuaded from using it :)

NeroHin commented 1 year ago

I don't find those papers terribly convincing and in our own paper we found that ngboost is generally only as good as xgboost or a little worse. That makes sense: ngboost isn't trying to directly compete with xgboost.

I still think it's probably a waste of time / energy to try to use ngboost in an ensemble if your goal is point prediction or classification, but I'm going to approve the PR here b/c it doesn't hurt to have the functionality and hopefully people will see this discussion and be dissuaded from using it :)

Thanks for your comment !

stanfordmlgroup / ngboost

feat: Add estimator type with reg and clf for NGBClassifier and NGBRegressor #325

324