omarfoq / FedEM

Official code for "Federated Multi-Task Learning under a Mixture of Distributions" (NeurIPS'21)
Apache License 2.0
152 stars 28 forks source link

Args for synthetic datasets #4

Closed RigCor7 closed 2 years ago

RigCor7 commented 2 years ago

Thanks for your interesting work! I really appreciate it and am trying to reuse some datasets you provide. And I want to generate a synthetic dataset of the same size as yours, would you mind telling me the args values used to generate it.

By the way, I saw in line 244 of the utils/utils file that val_iterator also uses the prefix of “train”. I think it may be "val", is it right?

Many thanks!

omarfoq commented 2 years ago

Hello,

The arguments to generate the synthetic dataset are provided in data/synthetic/README.md. This will generate a dataset with roughly 70,507 train samples, and 1.5 million test samples (because every client has 5k test samples). Note that the value reported in Table 1 of the paper is the total number of samples (train + test); for example for CIFAR-10/100, Table 1 reports a total of 60k samples (50k train + 10k test). See also #2 for more details.

Regarding the second question, the prefix is "train". val_iterator in this code does not represent an iterator over the validation set, instead it represents a data loader where the last batch is not ignored in the contrary of train_iterator that may ignore the last batch (see line line 305 in utils/utils.py). In order to use the validation set, one should pass is_validation=True to get_loaders implemented here. This is controlled via the argument validation (see here).

I hope this answers the questions, please let me know if you have any doubts.

RigCor7 commented 2 years ago

Thanks for your reply, I misunderstood validation set and test set before, now I get it!