Open SpeeeedLee opened 1 year ago
Hi,
Thanks for the question. The reason we cite [Hsu 2019] is for the default dirichlet alpha value of 0.9. And for a certain class, the code actually samples a dirichelet distribution for all the participating clients. The sampled_probabilities
is actually a 100-dimension vector. This vector holds the estimated number (possibly float number, need to get the real sampled value using int(round(~))
) of samples from the same class which every participant can hold. Then the code repeats the above process for all the classes through the for n in range(no_classes)
iteration in line 181 of _imagehelper.py. And i guess it is the same for the partition method you mentioned in [Hsu 2019].
Actually, we follow "How to backdoor federated learning"'s repository for the code of dirichlet sampling, which is a famous paper in backdooring FL. And we are sorry that we did not modify some of the annotations of the original repository which may cause misunderstanding.
I hope my answers address your confusion. Thanks.
Hi, I want to ask questions about non-iid partition method used. In image_helper.py, the dirichlet non-iid partition is done by function sample_dirichlet_train_data.
It seems like the code sample a dirichlet distirbution for every class label, which means the number of samples of each client : sampled_probabilities = class_size np.random.dirichlet(np.array(no_participants [alpha]))
However, this is not consistent with the referenced paper [Hsu et al., 2019], where the partition is done by sampling Dirichlet for each client, meaning the probability of different label each client holds, then further sample form these distribution.
Also, this is not consistent with the annotation of the function sample_dirichlet_train_data : '''
... Sample Method: take a uniformly sampled 10-dimension vector as parameters for dirichlet distribution to sample number of images in each class ... '''