Closed xxdznl closed 5 months ago
Thanks for your opensource work! @woodenchild95 But I am very confused by the aggregation method。 To my knowledge,the most used aggregation method is either “only aggregate parameters” or “only ”aggregate updated parameters”。like
self.server_model_params_list + self.args.global_learning_rate * Averaged_update
or only the “Averaged_model ” I didn‘t understand why do it both,Is there any reasonable explanation for this?Averaged_model + torch.mean(self.h_params_list, dim=0)
Or it is the ADMM(Sorry i am not familiar with ADMM) method of aggregation。
@xxdznl Hi, the highlighted code performs the aggregation for the dual variables in ADMM method which is a classic optimization algorithm. The dual variable is a specific auxiliary variable in this method, used to construct a specific local Lagrangian objective for effectively solving constrained optimization problems. More details can be learned from [1,2,3,4,5,6]. If you are not familiar with ADMM itself, you can start by understanding primal-dual optimization and the method of alternating multipliers. The aggregation calculation of dual variables is derived from the consistency constraints in the optimization process.
[1] FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data
[2] FedADMM: A Federated Primal-Dual Algorithm Allowing Partial Participation
[3] FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity
[4] Federated Learning Based on Dynamic Regularization
[5] FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning
[6] Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape
It's an honor to get your reply so quickly and concretely. Sure I will read the above article carefully. Thanks!
Thanks for your opensource work! @woodenchild95 But I am very confused by the aggregation method。 To my knowledge,the most used aggregation method is either “only aggregate parameters” or “only ”aggregate updated parameters”。like
self.server_model_params_list + self.args.global_learning_rate * Averaged_update
or only the “Averaged_model ” I didn‘t understand why do it both,Is there any reasonable explanation for this?Averaged_model + torch.mean(self.h_params_list, dim=0)
Or it is the ADMM(Sorry i am not familiar with ADMM) method of aggregation。