kiddyboots216 / CommEfficient

PyTorch for benchmarking communication-efficient distributed SGD optimization algorithms
71 stars 20 forks source link

issue of fetchpgd #14

Open wangclin opened 1 year ago

wangclin commented 1 year ago

hi, sorry to bother you again. When I want to only implement fetchpgd on commefficient-attack, to test the accuracy difference between SpaseFed and fetchSGD methods. I encounter a problem. My hyperparameter is --dataset_dir data/cifar10 --tensorboard --dataset_name CIFAR10 --model ResNet9 --mode fetchpgd -- k 10000 --num_blocks 1 --num_rows 1 --num_cols 325000 --num_clients 200 --num_workers 10 --error_type virtual --local_momentum 0.0 --virtual_momentum 0.9 the K, rows, cols value I use is same with fetchSGD. But I got: CommEfficient-attacks\CommEfficient-attacks\CommEfficient\fed_worker.py", line 177, in worker_loop sum_g += g RuntimeError: The size of tensor a (500000) must match the size of tensor b (6568640) at non-singleton dimension 1 Could you please help me to fix it? THX a lot!

kiddyboots216 commented 1 year ago

It looks like in fetchpgd we're trying to add together a full gradient and a sketch. The sketching step should come after the pgd step.

wangclin commented 1 year ago

Does fetchpgd is the implementation of paper SparseFed? I study the source code and found out that true_topk is actually in line with SparseFed paper in communication efficient. I think fetchpgd may include more robustness than true_topk, is that?

kiddyboots216 commented 1 year ago

I think fetchpgd is some unfinished code, actually. It's supposed to be evaluating the adaptive attack combination between SparseFed and my other paper Neurotoxin. We actually have results for it, so I might need to check whether the finished implementation is on another server.

But the idea is basically that the attacker does multiple steps of PGD according to Neurotoxin, where they project the update onto the bottom-k gradients at each iteration. Then the server implements SparseFed by doing the overall top-k operation.

As you noted, the robustness defenses in SparseFed are just top-k and then the sketching.

wangclin commented 1 year ago

Thanks for the giant helping. In fact, I am going to make some attempts in the communication efficiency field. So I need to compare with some Sota methods. Hence I want to make comparisons with fetchSGD and SparseFed. Actually, I use your source code sketch (FetchSGD) and true_topk (SparseFed), and I don't know is that suitable? I try fetchpgd but find out that the processing is not matched with spaseFed, so I use true_topk (to calculate transmission bytes and accuracy).

kiddyboots216 commented 1 year ago

Oh, I think that for communication efficiency you should be using the main branch and not the attacks branch. In case you mean communication efficiency with robustness: FetchSGD is in the main branch, and SparseFed is just per-user-gradient-clipping + either top-k or sketching.

wangclin commented 1 year ago

yeah, I know. In fact, I want to reference true_topk because it can provide a novel perspective on server compression (I find there are so many papers focusing on client compression), but I cannot find a proper paper that proposes this method. So, I want to reference SparseFed to introduce this idea in the communication efficiency field (of course I will explain that SparseFed focuses on robustness instead of efficiency). Or could you please provide some intuition source papers (true_topk)? It helps me a lot with paper writing, thx~

kiddyboots216 commented 1 year ago

Sure, so SparseFed doesn't introduce top-k. Top-k is introduced by some of the papers that we cite in FetchSGD, in particular; (the particular mechanism that we use with memory)

https://arxiv.org/abs/1809.07599

I would note that the FetchSGD work does compare to true top-k for the server compression.

wangclin commented 1 year ago

ok, thank you a lot~ how kind of you