About FedADMM - Githubissues

woodenchild95 / FL-Simulator

Pytorch implementations of some general federated optimization methods.

36 stars 5 forks source link

About FedADMM #8

Closed noobyzy closed 7 months ago

noobyzy commented 10 months ago

Hi, I noticed that you have also provided results from FedADMM in your published work.

Can you provide solver scripts for that algorithm?

woodenchild95 commented 10 months ago

@noobyzy I will update it to the code base of the next version. Thx :) Actually, you can see one of its implementation in [1].

[1] FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

noobyzy commented 10 months ago

@noobyzy I will update it to the code base of the next version. Thx :) Actually, you can see one of its implementation in [1].

[1] FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

Thanks. In fact, I have implemented it myself, but I found that the accuracy is really bad.

I checked the accuracy, the local model is actually doing good (for example, from 10% to 40% in one round of local update), but the client upload (the augmented model) is really bad (hovering at ~10%). Did you meet similar issue? Or is it my problem?

woodenchild95 commented 10 months ago

@noobyzy I will update it to the code base of the next version. Thx :) Actually, you can see one of its implementation in [1]. [1] FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

Thanks. In fact, I have implemented it myself, but I found that the accuracy is really bad.

I checked the accuracy, the local model is actually doing good (for example, from 10% to 40% in one round of local update), but the client upload (the augmented model) is really bad (hovering at ~10%). Did you meet similar issue? Or is it my problem?

Looks like it does not work on the global server. May you please provide the implementation details? I do not meet a similar issue. I remember the global learning rate and proxy coefficient must be decayed if the global update only relies on the ative dual variables instead of all dual variables. But even if I do not decay them, it still could achieve a nearly 60%~70% accuracy and then break down.

noobyzy commented 9 months ago

@woodenchild95 Sorry for the late reply.

Thank you for your feedback. I found it was my implementation problem. I was using MobileNetV3 with BatchNorm layer, which requires addressing the BN parameters separately from other layers.

noobyzy commented 8 months ago

Hi,

another thing I am wondering is what backbones are you using (especially for FedADMM)? In fact, I got similar results as yours, using ResNet18 + GN, but when I switch to MobileNetV2/V3(small/large)+GN, they all failed for FedADMM. Can you get similar results on other backbones?

woodenchild95 commented 7 months ago

Hi,

another thing I am wondering is what backbones are you using (especially for FedADMM)? In fact, I got similar results as yours, using ResNet18 + GN, but when I switch to MobileNetV2/V3(small/large)+GN, they all failed for FedADMM. Can you get similar results on other backbones?

Sorry for the late response. For the Resnet models, you need to decay the proxy coefficient. For the Transformer models, to the best of our knowledge, there are still no solid solutions for the FedADMM (even in FedDyn). Partial participation in the primal-dual methods leads to very large variance and it is very sensitive in the training. Hope this to be helpful : )