omarfoq / FedEM

Official code for "Federated Multi-Task Learning under a Mixture of Distributions" (NeurIPS'21)
Apache License 2.0
154 stars 28 forks source link

Data generation scripts mismatch the description in paper? #2

Closed mengcz13 closed 2 years ago

mengcz13 commented 2 years ago

Thanks for your interesting work! I really appreciate it and am trying to reuse some datasets you provide. However, I notice that the scripts provided cannot generate datasets matching the description in your paper...

For example, in FEMNIST, the paper claims that there are 359 clients in total after sampling (Table 1), and "For all tasks, we randomly split each local dataset into training (60%), validation (20%) and test (20%) sets." (Section 4).

However, the readme under data/femnist says that I should use the command python generate_data.py --s_frac 0.2 --tr_frac 0.8 --seed 12345, which samples >700 tasks (clients), and each client has 80% data for training and 20% for testing without any validation data.

I guess that I should use something like python generate_data.py --s_frac 0.1 --tr_frac 0.8 --val_frac 0.25 --seed 12345? (Still it does not match the description in the footnote of page 8 that "subsampled 15% from FEMNIST").

I haven't checked other datasets but seems EMNIST also uses --s_frac 0.2 (instead of 10% as claimed in the footnote) and does not include validation data.

Would you like to clarify how to generate the data used in this paper better? Many thanks!

yxdyc commented 2 years ago

Thanks for the interesting work! I am also trying to reuse some datasets from this paper and have the same question as @mengcz13.

I checked the EMNIST generated by the README script, which includes 130241/0/32610 samples for train/val/test set respectively, the total number is 20% of EMNIST and the partition ratio is 0.8:0:0.2.

The results mismatch the paper: "For all tasks, we randomly split each local dataset into training (60%), validation (20%) and test (20%) sets." (Section 4)." and "Total samples = 81425 for EMNIST" in Table 1.

Should I continue to use the default scripts to match the paper results of Table 2 and Table3? Could you please clarify how to generate the data used in this paper including all the other datasets? Many thanks!

omarfoq commented 2 years ago

Hello,

Thank you for the comments. Indeed the README file for the data generation process was not updated and does not match the paper.

For FEMNIST the command should be python generate_data.py --s_frac 0.15 --tr_frac 0.8 --val_frac 0.25 --seed 12345. This will result in 539 tasks instead of 359 tasks reported in the paper (I did a mistake while copying it). I also noticed that I was only reporting the number of the training samples for FEMNIST, while I was reporting the total number samples for other datasets. I will correct this on the camera-ready version.

For EMNIST the command should be python generate_data.py --s_frac 0.1 --tr_frac 0.8 --val_frac 0.25 --seed 12345. It leads to 48824/16274/16327 sample for train/val/test sets respectively. (Note that 48824 + 16274 +16327 = 81425 , matching the reported value in Table~1)

I precise that the results in Table~2 are obtained using the union of training and validation sets. The validation set was used to set the hyper-parameters (see the rebuttal).

I hope this answers your questions. Please let me know if I am missing something or if you notice some other inconsistencies.