Optimize score loss computation

Slightly related to #76

This is the second bullet point from https://github.com/ogrisel/pygbm/issues/69#issue-391170726

When early stopping (or just score monitoring) is done on the training data with the loss, we should just use the raw_predictions array from fit() instead of re-computing it.

Results would be slightly different from the current implementation because we are currently computing the loss on a subset of the training data, not on the whole training data.

A further optimization would be, instead of calling loss_.__call__(), to compute the loss w.r.t each sample in e.g. loss_.update_gradients_and_hessians and use those values to compute the gradients and hessians. Overhead would be minimal this way.

ogrisel / pygbm

Optimize score loss computation #77