Closed StatMixedML closed 1 year ago
It's a little more complicated than "does ngboost work" but overall I think he makes a very important point (which I've brought up as well https://github.com/stanfordmlgroup/ngboost/issues/298#issuecomment-1268542402): if calibrated prediction intervals are all you need, then conformal inference is a simple and perfectly good approach.
The complications are as follows:
So overall takeways:
@alejandroschuler Thanks for your fast and detailed reply.
- building prediction intervals are not the same thing as doing conditional density estimation, which is what ngboost does. If you have the latter you get the former "for free", but not the other way around.
I fully agree with that statement. Also, since models like NGBoost, XGBoostLSS and LightGBMLSS model all moments of a distribution (mean, variance, ...) as functions of covariates, you gain a more detailed view on what actually drives variance etc., i.e., a better understanding of the Data Generating Process.
Conformal Prediction does not output prediction intervals only. Conformal Predictive Distributions output the whole CDF for predictions, the whole CDF is calibrated by default with mathematical guarantees for any underlying model, any data distribution and and dataset size.
Sharing some links in case they might be of interest to NGBoost devs.
https://proceedings.mlr.press/v91/vovk18a.html
Thanks for the resources @valeman!
I'd say that if someone wants predicted CDFs something like mondrian conformal predictive systems is a clear choice for a first pass. That said, I think there are a few conceptual/practical wrinkles that can make a less rigorous approach (like ngboost) more attractive to users:
These are mostly practical issues rather than theoretical. I hope to see continuing progress and consolidation along those fronts. I wouldn't be opposed to retiring ngboost entirely once it's not offering any theoretical, computational, or ease-of-use benefit!
Something I forgot to mention: when a PDF is what you need you often want some guarantee that your estimate converges (pointwise) to the true PDF. It's not immediately clear to me that the marginal coverage guarantee you get from conformal prediction of the CDF gives you pointwise convergence of the PDF even if it were clear how to get a PDF from an eCDF-like object. However, with a silly parametric model you of course get this convergence and thus this should be the case for something semiparametric like NGBoost as well (if you believe the assumed shape of the conditional distributions). So that's one case where ngboost at least gives you a guarantee in an unrealistic setting, whereas a conformal approach leaves you hanging.
@alejandroschuler quick question, I see NGBoost paper was published on ArXiv, was it published in some peer reviewed journal as well? I can't seem to find peer reviewed journal publication for NGBoost.
@alejandroschuler quick question, I see NGBoost paper was published on ArXiv, was it published in some peer reviewed journal as well? I can't seem to find peer reviewed journal publication for NGBoost.
yeah it came out in ICML 2020: http://proceedings.mlr.press/v119/duan20a/duan20a.pdf
I found this article to be interesting Does NGBoost work? Evaluating NGBoost against key criteria for good probabilistic prediction, where the author compares the performance of NGBoost to conformal prediction.
I have already replied to one of the tweets: https://twitter.com/predict_addict/status/1588603934666805248
@alejandroschuler I would be interested in hearing your opinion on this.