Pytorch implementations of some general optimization methods in the federated learning community.
FedAvg: Communication-Efficient Learning of Deep Networks from Decentralized Data
FedProx: Federated Optimization in Heterogeneous Networks
FedAdam: Adaptive Federated Optimization
SCAFFOLD: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
FedDyn: Federated Learning Based on Dynamic Regularization
FedCM: FedCM: Federated Learning with Client-level Momentum
FedSAM/MoFedSAM: Generalized Federated Learning via Sharpness Aware Minimization
FedGamma: Fedgamma: Federated learning with global sharpness-aware minimization
FedSpeed: FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
FL-Simulator works on one single CPU/GPU to simulate the training process of federated learning (FL) with the PyTorch framework. If you want to train the centralized-FL with FedAvg method on the ResNet-18 and Cifar-10 dataset (10% active clients per round of total 100 clients, and heterogeneous dataset split is Dirichlet-0.6), you can use:
python train.py --non-iid --dataset CIFAR-10 --model ResNet18 --split-rule Dirichlet --split-coef 0.6 --active-ratio 0.1 --total-client 100
Other hyperparameters are introduced in the train.py file.
FL-Simulator pre-define the basic Server class and Client class, which are executed according to the vanilla $FedAvg$ algorithm. If you want define a new method, you can define a new server file first with:
process_for_communication( ): how your method pre-processes the variables for communication to each client
postprocess( ): how your method post-processes the received variables from each local client
global_update( ): how your method processes the update on the global model
Then you can define a new client file or a new local optimizer for your own method to perform the local training. Similarly, you can directly define a new server class to rebuild the inner-operations.
CIFAR-10 (ResNet-18-GN) T=1000 | ||||||||||
10%-100 (bs=50 Local-epoch=5) | 5%-200 (bs=25 Local-epoch=5) | |||||||||
IID | Dir-0.6 | Dir-0.3 | Dir-0.1 | Time / round | IID | Dir-0.6 | Dir-0.3 | Dir-0.1 | Time / round | |
SGD basis | ||||||||||
FedAvg | 82.52 | 80.65 | 79.75 | 77.31 | 15.86s | 81.09 | 79.93 | 78.66 | 75.21 | 17.03s |
FedProx | 82.54 | 81.05 | 79.52 | 76.86 | 19.78s | 81.56 | 79.49 | 78.76 | 75.84 | 20.97s |
FedAdam | 84.32 | 82.56 | 82.12 | 77.58 | 15.91s | 83.29 | 81.22 | 80.22 | 75.83 | 17.67s |
SCAFFOLD | 84.88 | 83.53 | 82.75 | 79.92 | 20.09s | 84.24 | 83.01 | 82.04 | 78.23 | 22.21s |
FedDyn | 85.46 | 84.22 | 83.22 | 78.96 | 20.82s | 81.11 | 80.25 | 79.43 | 75.43 | 22.68s |
FedCM | 85.74 | 83.81 | 83.44 | 78.92 | 20.74s | 83.77 | 82.01 | 80.77 | 75.91 | 21.24s |
SAM basis | ||||||||||
FedGamma | 85.74 | 84.80 | 83.81 | 80.72 | 30.13s | 84.99 | 84.02 | 83.03 | 80.09 | 33.63s |
MoFedSAM | 87.24 | 85.74 | 85.14 | 81.58 | 29.06s | 86.27 | 84.71 | 83.44 | 79.02 | 32.45s |
FedSpeed | 87.31 | 86.33 | 85.39 | 82.26 | 29.48s | 86.87 | 85.07 | 83.94 | 79.66 | 33.69s |
FedSMOO | 87.70 | 86.87 | 86.04 | 83.30 | 30.43s | 87.40 | 85.97 | 85.14 | 81.35 | 34.80s |
The blank parts are awaiting updates.
Some key hyparameters selection
local Lr | global Lr | Lr decay | SAM Lr | proxy coefficient | client-momentum coefficiet | |
FedAvg | 0.1 | 1.0 | 0.998 | - | - | - |
FedProx | 0.1 | 1.0 | 0.998 | - | 0.1 / 0.01 | - |
FedAdam | 0.1 | 0.1 / 0.05 | 0.998 | - | - | - |
SCAFFOLD | 0.1 | 1.0 | 0.998 | - | - | - |
FedDyn | 0.1 | 1.0 | 0.9995 / 1.0 | - | 0.1 | - |
FedCM | 0.1 | 1.0 | 0.998 | - | - | 0.1 |
FedGamma | 0.1 | 1.0 | 0.998 | 0.01 | - | - |
MoFedSAM | 0.1 | 1.0 | 0.998 | 0.1 | - | 0.05 / 0.1 |
FedSpeed | 0.1 | 1.0 | 0.998 | 0.1 | 0.1 | - |
FedSMOO | 0.1 | 1.0 | 0.998 | 0.1 | 0.1 | - |
The hyperparameter selections above are for reference only. Each algorithm has unique properties to match the corresponding hyperparameters. In order to facilitate a relatively fair comparison, we report a set of selections that each method can perform well in general cases. Please adjust the hyperparameters according to changes in the different model backbones and datasets.
If this codebase can help you, please cite our papers:
FedSpeed (ICLR 2023):
@article{sun2023fedspeed,
title={Fedspeed: Larger local interval, less communication round, and higher generalization accuracy},
author={Sun, Yan and Shen, Li and Huang, Tiansheng and Ding, Liang and Tao, Dacheng},
journal={arXiv preprint arXiv:2302.10429},
year={2023}
}
FedSMOO (ICML 2023 Oral):
@inproceedings{sun2023dynamic,
title={Dynamic regularized sharpness aware minimization in federated learning: Approaching global consistency and smooth landscape},
author={Sun, Yan and Shen, Li and Chen, Shixiang and Ding, Liang and Tao, Dacheng},
booktitle={International Conference on Machine Learning},
pages={32991--33013},
year={2023},
organization={PMLR}
}