Closed fabianp closed 8 years ago
@mblondel it would be good to have this into the release as I believe it does fix a bug
Personally, I get the following formula:
inv_stepsize = 1 / n_samples * lipschitz_constant * L + alpha
stepsize = 1.0 / inv_stepsize
where lipschitz_constant is 1 for squared loss, 0.25 for log loss and 2 for squared hinge. Trying it for specifc values of n_samples, L and alpha, it seems to lead to larger step sizes.
Another remark is that the branch for the log loss is confusing because it doesn't follow the same pattern as for other losses. It would be better to define the lipschitz constant in the conditions and use a common formula at the end. Something like this:
if loss == "log":
lipschitz_constant = 0.25
elif loss == "squared":
lipschitz_constant = 1.0
[...]
step_size = 1.0 / (... lipschitz_constant ...)
@mblondel what is L in your formula? Note that the Lipschitz constant should be the maximum of the Lipschitz constants for all the "f_i" (objective function = (1/n) Sum_i f_i) . It cannot go to zero as the number of samples increases (assuming alpha=0) as in your formula.
L = get_max_squared_sum(X)
OK, anyway I think that your inv_stepsize should be multiplied by n_samples (since as I said the inv_stepsize should be the max Lipschitz constant for the f_i ).
I am a bit confused. Maybe L
in my formula should be the sum of squared norms instead of the max. Anyway, I don't have enough time to carefully check so we can go with you formula. In the future, it might be interesting to add the derivation to the doc.
One thing to remember is that there are several ways to derive Lipschitz constants, only some are tighter than others :).
It would still be interesting to address my second remark.
I forced pushed a version with the reworked logic here ... I agree to take the fall for this if it is again wrong* :-)
This is the same fix as PR #53 . I'm re-opening it because I think that it is in fact correct.
Our formulation of SAG(A) is minimize_w 1 / n_samples * \sum_i loss(w^T x_i, y_i)
to put all into the sum we need to multiply the alpha by n yielding
and so the Lipschitz constant should be L + n * alpha . This is consistent with https://hal.inria.fr/hal-00860051/document page 21 ("Section 4.5 ...")