owkin / FLamby

Cross-silo Federated Learning playground in Python. Discover 7 real-world federated datasets to test your new FL strategies and try to beat the leaderboard.
https://owkin.github.io/FLamby/
MIT License
204 stars 24 forks source link

Benchmarking Track's WP4: What strategies do we benchmark ? #11

Closed jeandut closed 2 years ago

jeandut commented 2 years ago

For now we need a minima:

Given time we could add as well:

philipco commented 2 years ago

Hello,

We agree that FedAvg, FedProx and Scaffold seem to be a good set of simple baselines of interest. To our understanding, as our goal is not yet to achieve an optimized performance on the datasets, but possibly to have high level take-aways, these original methods seem best suited.

Regarding potential other baselines, maybe the question can be phrased in the following way:

a. What are the solutions that have been proposed to tackle heterogeneity in FL? b. Which one deserves to be considered as baselines in our framework and why?

Regarding a., we think beyond the ones you suggested, the following methods are intended to tackle heterogeneity: Fednova, MIME, FedCD, FedAdams/FedAdagrad/FedYogi (Adaptive Federated Optimization). This is a preliminary list that should probably be updated.

Regarding b., we see the following potential arguments:

  1. is widely recognized as a reference for heterogeneous FL. [Could be based on #citations, or your expertise]
  2. was used as a reference in a similar paper. [e.g. Federated Learning on Non-IID Data Silos: An Experimental Study (https://arxiv.org/pdf/2102.02079.pdf) uses FedNova]
  3. substantially differs in terms of approach from FedProx (prox term) or Scaffold (control variates), or exhibits a feature that allows to somehow ``describe’’ heterogeneity [as control variates for Scaffold]
  4. Other ideas?

Cheers, Constantin and Aymeric

philipco commented 2 years ago

To initiate the discussion, we propose below a table (that anyone can participate in editing) with some algorithms meant to tackle heterogeneous FL-algorithms.

  # citations Designed for non-i.i.d. data Major features
FedAvg https://arxiv.org/pdf/1602.05629.pdf 4504 NO 1) local update 2) weighted average
MOCHA https://proceedings.neurips.cc/paper/7029-federated-multi-task-learning 855 YES Alternating optimization of model weights and task relationship matrix
FedProx https://arxiv.org/pdf/1812.06127.pdf 807 YES 1) generalization of FedAvg 2) add a proximal term 3) restrict local update to be close to the initial (global) model
Scaffold https://arxiv.org/pdf/1910.06378.pdf 356 YES 1) correct client drift 2) control-variates
FedAdams/FedYogi/FedAdagrad https://arxiv.org/abs/2003.00295 248 YES 1) federated versions of adaptive optimizer
Cyclical Weight Transfer https://pubmed.ncbi.nlm.nih.gov/29617797/ 178 KINDOF ensures that each client is sufficiently visited
Clustered Federated Learning https://ieeexplore.ieee.org/abstract/document/9174890 178 YES Clusters silos after FL has converged
FedNova https://arxiv.org/pdf/2007.07481.pdf 140 YES 1) focus on heterogeneous number of local updates 2) FedProx and FedAvg as part. cases 3) flexibility to choose any local solver
Ditto https://proceedings.mlr.press/v139/li21h.html 62 YES Each node trains a local model at the same time as everyone jointly trains the global model. At each round, the distance from the current state of the evolving global model is used as a regularization term in the local training.
MIME https://arxiv.org/pdf/2008.03606.pdf 47 YES 1) correct client drift 2) control-variates 3) server-level optimizer state (momentum, adaptive step size) 4) for cross-device settings
FedCD https://arxiv.org/pdf/2006.09637.pdf 10 YES 1) clones and deletes models to dynamically group devices with similar data

FEEL FREE TO UPDATE THE ABOVE TABLE.

Again, the focus of the paper being the datasets, it may just be sufficient to consider FedAvg, Scaffold and FedProx.

Constantin and Aymeric

bellet commented 2 years ago

The provided list sounds good and natural.

In the future one could also consider personalized FL approaches? For instance based on fine-tuning/MAML (eg FedAvg+ https://arxiv.org/pdf/1909.12488.pdf) or regularization to the mean (https://arxiv.org/pdf/2010.02372.pdf) which are both closely related to simple FedAvg. There are also popular Federated MTL approaches based on pairwise regularization or cluster/mixture assumptions.

Grim-bot commented 2 years ago

I thought about adding Ditto https://proceedings.mlr.press/v139/li21h.html (58 citations) to philipco's table above, but their experiments are mainly cross-device (100+ devices). Nevertheless, they also report good results on the Vehicle dataset (23 devices) - is that too many to be considered cross-silo ?

jeandut commented 2 years ago

The provided list sounds good and natural.

In the future one could also consider personalized FL approaches? For instance based on fine-tuning/MAML (eg FedAvg+ https://arxiv.org/pdf/1909.12488.pdf) or regularization to the mean (https://arxiv.org/pdf/2010.02372.pdf) which are both closely related to simple FedAvg. There are also popular Federated MTL approaches based on pairwise regularization or cluster/mixture assumptions.

We will keep that in mind it shouldn't be very hard to modify the code in that sense. FLamby was thought of to be extensible.

I thought about adding Ditto https://proceedings.mlr.press/v139/li21h.html (58 citations) to philipco's table above, but their experiments are mainly cross-device (100+ devices). Nevertheless, they also report good results on the Vehicle dataset (23 devices) - is that too many to be considered cross-silo ?

Personally I put the threshold at 50 but it was never formalized maybe this vehicle dataset is worth looking at ? Do they use natural splits ?

bellet commented 2 years ago

For information on vehicle and some other datasets with natural splits but that are more 'cross-device' than 'cross-silo' in spirit, see page 20 of http://researchers.lille.inria.fr/abellet/papers/aistats20_graph_supp.pdf The school dataset may be of interest but has 140 centers (schools)

Grim-bot commented 2 years ago

Personally I put the threshold at 50 but it was never formalized maybe this vehicle dataset is worth looking at ? Do they use natural splits ?

jeandut commented 2 years ago

@Grim-bot the vehicle dataset was already mentioned in the related works in the overleaf. We should have already enough datasets with @pmangold adding this one + @sssilvar and @AyedSamy working on IXI plus @regloeb working on TCGA-survival.

jeandut commented 2 years ago

ProxSkip was accepted at ICML might be worth implementing to get some hype, but closing in the mean time.