about FedSpeed - Githubissues

harrylee999 commented 1 year ago

hi，i have a question about FedSpeed implement. I follow your Algorithm presudo code in the paper，but it doesnt converge or work. Then i read your code here step by step, I noticed that your code doesn't match the one in the paper.

self.h_params_list = torch.zeros((args.total_client, init_par_list.shape[0]))

 def global_update(self, selected_clients, Averaged_update, Averaged_model):
        # FedSpeed (ServerOpt)
        # w(t+1) = average_s[wi(t)] + average_c[h(t)]
        return Averaged_model + torch.mean(self.h_params_list, dim=0)

The _self.h_paramslist store updates of all clients.
For the global model update，you use _torch.mean(self.h_paramslist, dim=0)，to mean the updates of all clients. But in your paper's presudo code, it mean the updates of active clients in this round. Can you explain this?

and i also curious about What's the difference between your dynamic regularization and FedDyn？ are they equal？

woodenchild95 commented 1 year ago

@harrylee999 Thanks for this question. In this version, I merge the FedDyn and the perturbation together rather than use the ADMM-based method directly. FedDyn is also a special ADMM. Their difference is though the local dual variables $\hat{g}_i$ are updated locally, FedDyn uses the average of all local dual variables to globally update (including those clients not participating in the current round). While the general ADMM only uses the average of dual variables of participating clients. Details can be compared between FedPD and FedDyn.

ADMM is adopted to solve the equality constraint $x_i=x$. Both of these two methods are ok. In current experiments, FedDyn+perturbation performs a little better than FedPD+perturbation, (about $0.7\%$ improvements).

harrylee999 commented 1 year ago

@woodenchild95 Thank you for your reply. I learned a lot from you. i notice your another work "Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape" which propose FedSMOO. According to my understanding，FedSMOO = global SAM + FedDyn and FedSpeed = local sam + FedDyn， am i right？ if that right，FedSMOO should be better than FedSpeed. Have you done any experiments comparing the two methods？ and can you share the code of FedSMOO for reference when you have time？ thanks a lot.

woodenchild95 commented 1 year ago

@harrylee999 Global SAM is a dream and FedSMOO is a tentative effort on this idea. As introduced in the paper, the effective range of SAM is limited. So FedSMOO introduces an additional constraint $s=s_i$ and adopts two ADMM to update the local models alternately.

As for the vanilla FedSpeed, actually, you can treat it as FedPD + local SAM. When I tried FedSMOO, I realized that FedDyn is currently the best ADMM-based method. So when I submit this code I adopt the FedDyn + local SAM as FedSpeed. From the optimization perspective, they are equal. But it will be improved experimentally. FedSMOO is the FedDyn + global SAM, it performs better than the vanilla FedSpeed (FedPD + local SAM), but achieves the comparable test accuracy as the current FedSpeed (FedDyn + local SAM). Adopting the global SAM on FedDyn is not as effective as the vanilla ADMM methods.

This phenomenon is very interesting, and I will follow more details about the experimental issues here. Recently I have been visiting other places, I will rebuild and submit the relevant code later. Thank you very much for your attention :)

woodenchild95 commented 7 months ago

@harrylee999 Hi, we have released the replication code for FedSMOO in the latest version, and it is currently undergoing testing. Thank you for your attention!

woodenchild95 / FL-Simulator

about FedSpeed #6