I was recently working on a project and when inspecting resource usage realized memory consumption was way too high for .pred_dist() when passing a max_iter value other than None.
When inspecting the code, I realized that the issue stemmed from using .staged_pred_dist() but only returning the last value. This resulted in unnecessary memory allocation (in my case hundreds of GBs).
if (
max_iter is not None
): # get prediction at a particular iteration if asked for
dist = self.staged_pred_dist(X, max_iter=max_iter)[-1] <-- Only using last Dist obj, but allocates memory for max_iter * Dist objects
Also, the conditional statement wasn't really needed. All it was checking for is if max_iter is None or not, but pred_param() can already handle that case.
Hi,
I was recently working on a project and when inspecting resource usage realized memory consumption was way too high for
.pred_dist()
when passing amax_iter
value other thanNone
.When inspecting the code, I realized that the issue stemmed from using
.staged_pred_dist()
but only returning the last value. This resulted in unnecessary memory allocation (in my case hundreds of GBs).Also, the conditional statement wasn't really needed. All it was checking for is if
max_iter
isNone
or not, butpred_param()
can already handle that case.This PR fixes the aforementioned issues.
Thanks!