stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

Fix issue causing large memory consumption in pred_dist() #344

Closed mesenrj closed 5 months ago

mesenrj commented 5 months ago

Hi,

I was recently working on a project and when inspecting resource usage realized memory consumption was way too high for .pred_dist() when passing a max_iter value other than None.

When inspecting the code, I realized that the issue stemmed from using .staged_pred_dist() but only returning the last value. This resulted in unnecessary memory allocation (in my case hundreds of GBs).

if (
    max_iter is not None
):  # get prediction at a particular iteration if asked for
    dist = self.staged_pred_dist(X, max_iter=max_iter)[-1]   <-- Only using last Dist obj, but allocates memory for max_iter * Dist objects

Also, the conditional statement wasn't really needed. All it was checking for is if max_iter is None or not, but pred_param() can already handle that case.

This PR fixes the aforementioned issues.

Thanks!