visitworld123 / FedFed

[NeurIPS 2023] "FedFed: Feature Distillation against Data Heterogeneity in Federated Learning"
MIT License
101 stars 8 forks source link

Confusion about the sharing data #3

Closed lhq12 closed 7 months ago

lhq12 commented 7 months ago

Thanks for the awesome work. I find in the code that you actually share two views of the same data (i.e. share_data1, share_data2). My question is : 1) To my understanding, sharing multiple views of the same data can potentially improve model performance by providing complementary information, will the performance drop if only one view is shared? 2) combining them can provide a more comprehensive understanding of the underlying data distribution, will this lead to more privacy leakage?

visitworld123 commented 7 months ago

Thanks for your attention to our work. Firstly, the performance drops slightly if we share only one view. However I have only tested a few settings. I think it would be interesting to do more experiments (only share shared_data1 v.s. shared_data1 and shared_data2 with the same noise strength). Secondly, we have added more strong noise to the second view (can be seen in VAE_std2). In this case, a malicious actor would find it easier to obtain information from share_data1 than share_data2. However, it might be interesting to recover the raw data in a diffusion manner if we have sufficient noised performance-sensitive features sequence.

lhq12 commented 7 months ago

Thanks for the quick reply.

Just now I found a mistake in the code: FL_VAE.py lines 169-170, which may influence the performance.

visitworld123 commented 7 months ago

Yep, it's a mistake. It is the setting of shared_data1 and shared_data2 with the same noise strength.

visitworld123 commented 7 months ago

I am closing the issue now. If you have any further questions, feel free to reach out :)

lhq12 commented 7 months ago

Could you share the hyper-parameter settings which leads to the results reported in the paper? I have tried once with the config.yaml and obtained the following results: "server_0_test_acc_epoch:85.43 & server_0_total_acc_epoch:85.32000404230781", which is much lower than the one you reported.

visitworld123 commented 7 months ago

In CIFAR-10 with $\alpha=0.1, K=10, E=1$, you can set the hyper-parameters as follows: VAE_adaptive=False, VAE_std1=0.15, VAE_std2=0.25.

lhq12 commented 7 months ago

Thanks!

Is it the same setting for different datasets, \alpha, K and E?

visitworld123 commented 7 months ago

VAE_std1 and VAE_std2 are almost the same. Actually, It requires a lot of effort to select suitable VAE_re, VAE_ce, VAE_kl and VAE_x_ce under different datasets, $\alpha$, and clients numbers.

lhq12 commented 7 months ago

Hi!

I train resnet-18 on CIFAR-10 with alpha=0.1, K=10, E=1 using the hyperparameter you told me, but still failing to approach the reported results. I got the folowing results: FedFed + FedAvg: 88.92 FedProx: 87.82 SCAFFOLD: fail to converge

Could you help me to check if anything important missed?

visitworld123 commented 7 months ago

Could you share the data distribution of all clients and your hardware configurations?

visitworld123 commented 7 months ago

I recheck the default.py and modify the random seed to ensure the same data distribution across clients in paper. You can retry the experiments. Sometimes, different hardware also resulting different performance gains. If you have any more problems, feel free to reach me directly by email.

lhq12 commented 7 months ago

train_cls_local_counts_dict = {0: {2: 704, 3: 1, 4: 1219, 5: 236, 6: 1061, 8: 70, 9: 29}, 1: {0: 13, 1: 774, 3: 2725, 6: 91, 7: 295, 8: 1}, 2: {0: 127, 1: 22, 3: 3, 7: 200, 8: 14, 9: 1752}, 3: {0: 6, 3: 6, 4: 276, 5: 8, 7: 1, 8: 20, 9: 3219}, 4: {0: 826, 1: 291, 2: 102, 3: 10, 4: 3336, 5: 1142}, 5: {0: 2, 1: 2099, 2: 3266}, 6: {1: 16, 2: 543, 3: 154, 4: 123, 5: 74, 7: 4503}, 7: {0: 2722, 2: 159, 3: 2096, 6: 2560}, 8: {0: 1303, 1: 1797, 2: 64, 3: 4, 4: 45, 5: 3539}, 9: {0: 1, 1: 1, 2: 162, 3: 1, 4: 1, 5: 1, 6: 1288, 7: 1, 8: 4895}}

Linux-4.4.0-142-generic-x86_64-with-glibc2.17 GPU type GeForce RTX 3090 python=3.8.16=h7a1cb2a_2 pytorch=1.8.0=py3.8_cuda11.1_cudnn8.0.5_0

I notice that you log the test_accuracy after testing each batch in normal_trainer.test_on_server_for_round, which may be inappropriate. So I only log the test_acc_avg.avg once all samples are processed, as generally done. I don't know if the reported results are obtained in the former way.

visitworld123 commented 7 months ago

You can try random seed 0 with the following data distribution: {0: {0: 976, 1: 1632, 2: 18, 3: 1120, 5: 132, 7: 636, 8: 3101}, 1: {1: 1, 5: 59, 7: 3, 8: 851, 9: 4228}, 2: {1: 814, 2: 212, 3: 3735, 5: 2737}, 3: {0: 46, 1: 291, 3: 4, 8: 401, 9: 123}, 4: {0: 125, 2: 1, 3: 137, 4: 1476, 6: 303, 8: 642, 9: 47}, 5: {0: 1923, 1: 381, 2: 3826}, 6: {1: 334, 2: 3, 3: 3, 4: 2477, 5: 2055, 7: 1543}, 7: {0: 6, 1: 1546, 2: 881, 7: 591, 8: 4, 9: 6}, 8: {0: 435, 2: 58, 4: 1046, 5: 16, 6: 4536}, 9: {0: 1489, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 161, 7: 2227, 8: 1, 9: 596}. And we report total ACC in our paper.

lhq12 commented 7 months ago

There may be something wrong with SCAFFOLD agorithm. The training loss is 'None'.

visitworld123 commented 7 months ago

You can try a smaller learning rate like 0.001 and 0.0001

visitworld123 commented 7 months ago

I will check SCAFFOLD.

lhq12 commented 7 months ago

Thanks.