Closed KevinM1ao closed 7 months ago
Hi, I have a bit of a question about the implementation part of your fedprox algorithm, could you please answer it?
In the paper, the loss function is followed by an L2 regularization term for the current model and the global model, but it seems that their L2 regularization is not computed in your code:torch.sum(param_list delta_list) -1.
Hi, @KevinM1ao, as the problem is defined: $\min_w h = f(w) + \frac{\mu}{2}||w - w^t||^2 = \min_w f(w) + \frac{\mu}{2}||w||^2 + \mu\langle w, w^t \rangle$. After reconstruction, $\frac{\mu}{2}||w||^2$ is the regularization loss and $\mu\langle w, w^t \rangle$ is the loss in the red blank above. The first loss is added to the weight decay in the local optimizer, and the second loss is constructed above. You can also implement a new optimizer class to perform the $w = w - \eta(g + \mu(w-w^t))$. They are equal.
thanks so much!
Hi, I have a bit of a question about the implementation part of your fedprox algorithm, could you please answer it?
In the paper, the loss function is followed by an L2 regularization term for the current model and the global model, but it seems that their L2 regularization is not computed in your code:torch.sum(param_list delta_list) -1.