ypotdevin / dpgbdt

Apache License 2.0
0 stars 0 forks source link

Post process negative loss #16

Closed ypotdevin closed 2 years ago

ypotdevin commented 2 years ago

In dp_rmse.py, the lines

noise = rng.standard_cauchy()
dp_rmse = cast(float, rmse + 2 * (gamma + 1) * sens * noise / epsilon)

may yield (large) negative values – unreasonable for rMSE loss values. These may be some viable counter measures:

  1. Simply skip new trees having negative associated loss
  2. Keep track of recent losses to detect outlier (not necessarily just negative) losses, skip those
  3. After encountering a negative loss, clip it to 0 and raise the comparison threshold prev_loss < current_loss according to a (yet to be determined) schedule step by step, so that new trees get a chance to join the ensemble again (which would otherwise be highly unlikely).
ypotdevin commented 2 years ago

Counter measure 3 might be enhanced by recording each tree and the threshold it would have passed (but did not, since the comparison was stricter), and inserting it into the ensemble once (in retrospect) the threshold is loose enough.

ypotdevin commented 2 years ago

Counter measure 2 may be realized by keeping a history of enhancements (prev_loss - current_loss if current_loss < prev_loss). Then, check whether the current enhancement is significantly better then previous (recent) enhancements. For example, calculate a (moving) average and a (moving) standard deviation [or their robust versions] and check whether the current enhancement is exceeds avg + n std for some small n*.

ypotdevin commented 2 years ago

See https://en.wikipedia.org/wiki/Moving_average, especially https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average and https://en.wikipedia.org/wiki/Moving_average#Exponentially_weighted_moving_variance_and_standard_deviation.