Variance or weighted variance to measure the fairness?

litian96 / fair_flearn

Fair Resource Allocation in Federated Learning (ICLR '20)

MIT License

242 stars 59 forks source link

Variance or weighted variance to measure the fairness? #1

Closed cshjin closed 4 years ago

cshjin commented 5 years ago

I see you have the fairness definition according to the variance of accuracies across m devices.

However, the variance you compared https://github.com/litian96/fair_flearn/blob/5f174cba7521df606fc946f40a0b5fef09b546e3/plot_fairness.py#L51 has the assumption that the accuracies are equally like the same.

My question is, instead of using the standard the variance, why not calculate the variance in a weighted fashion?

A revised variance, either in the weights of the sample distributions or the q-federated weights makes much more sense to me. Is that true?

litian96 commented 5 years ago

The standard variance is also consistent with the accuracy distribution figures. We would like to treat each device equally, such that each user's experiences are the same (even if they have different numbers of training samples, etc).

cshjin commented 5 years ago

My concern is for a higher acc in a device with a smaller sample size won't reflect its fairness in the variance. And vice versa.

As in the eq (2), the objective has the weights involved, then the variance (fairness) could also have the weights involved.

litian96 commented 5 years ago

A larger loss potentially means small accuracy. Why won't it affect the fairness/variance?

cshjin commented 5 years ago

Sorry, revised as accuracy. Consider a toy example: three devices (acc, # sample) (0.5, 10), (0.5, 10), (0.5, 10000) -- low variance (0.1, 10), (0.1, 10), (0.9, 10000) -- high variance

Which one is better in terms of "fairness"?

litian96 commented 4 years ago

Sorry I missed your previous question... By our definition (Def 1 in https://arxiv.org/pdf/1905.10497.pdf), the first one is more fair and in your second example, the model seems to overfit to the 'dominant' devices with 10,000 samples, which is what we are trying to mitigate.

cshjin commented 4 years ago

Yes, thanks for your reply. I see the published work on ICLR. Congrats. I will follow up later.

litian96 commented 4 years ago

Hi Hongwei,

do you have any additional questions/comments regarding this issue?

cshjin commented 4 years ago

Not at this moment, thanks. Issue can be closed.