Closed RigCor7 closed 2 years ago
Hello,
The arguments to generate the synthetic dataset are provided in data/synthetic/README.md
. This will generate a dataset with roughly 70,507 train samples, and 1.5 million test samples (because every client has 5k test samples). Note that the value reported in Table 1 of the paper is the total number of samples (train + test); for example for CIFAR-10/100, Table 1 reports a total of 60k samples (50k train + 10k test). See also #2 for more details.
Regarding the second question, the prefix is "train". val_iterator
in this code does not represent an iterator over the validation set, instead it represents a data loader where the last batch is not ignored in the contrary of train_iterator
that may ignore the last batch (see line line 305 in utils/utils.py
). In order to use the validation set, one should pass is_validation=True
to get_loaders
implemented here. This is controlled via the argument validation
(see here).
I hope this answers the questions, please let me know if you have any doubts.
Thanks for your reply, I misunderstood validation set and test set before, now I get it!
Thanks for your interesting work! I really appreciate it and am trying to reuse some datasets you provide. And I want to generate a synthetic dataset of the same size as yours, would you mind telling me the args values used to generate it.
By the way, I saw in line 244 of the utils/utils file that val_iterator also uses the prefix of “train”. I think it may be "val", is it right?
Many thanks!