ogrisel / pygbm

Experimental Gradient Boosting Machines in Python with numba.
MIT License
183 stars 32 forks source link

Optimize score loss computation #77

Open NicolasHug opened 5 years ago

NicolasHug commented 5 years ago

Slightly related to #76

This is the second bullet point from https://github.com/ogrisel/pygbm/issues/69#issue-391170726

When early stopping (or just score monitoring) is done on the training data with the loss, we should just use the raw_predictions array from fit() instead of re-computing it.

Results would be slightly different from the current implementation because we are currently computing the loss on a subset of the training data, not on the whole training data.

A further optimization would be, instead of calling loss_.__call__(), to compute the loss w.r.t each sample in e.g. loss_.update_gradients_and_hessians and use those values to compute the gradients and hessians. Overhead would be minimal this way.