visitworld123 / FedFed

[NeurIPS 2023] "FedFed: Feature Distillation against Data Heterogeneity in Federated Learning"
MIT License
101 stars 8 forks source link

Question regarding DP-Equation for Performance Sensitive Features #5

Open MaxH1996 opened 7 months ago

MaxH1996 commented 7 months ago

Hi,

thank you for your interesting work and also posting your code. I am currently trying to understand the DP argument in your paper specifically regarding its similarity to DP-SGD. As far as I understand, for DP-SGD we are given $\sigma = \frac{c \cdot q \cdot \sqrt{T \cdot ln( \frac{1}{\delta})}}{\epsilon}$ where $c$ is the clipping value of the $l_2$-norm, $q$ the fraction of samples, and $(\epsilon,\delta)$ the privacy parameters and T the number of iterations.

In your paper you provide a very similar form for the performance sensitive features: $\sigma_s = \frac{\rho \sqrt{R \cdot ln(\frac{1}{\delta})}}{\epsilon}$, where $q$ has been set to 1. As far as I understand $\rho$ is the defined by the relation $||{x_s}|| = \rho||{x}||$ where $0 < \rho < 1$.

My question is: why does the norm of $x_s$ not appear in the equation for $\sigma_s$? Or put differently, why isn't $\sigma_s$ given by $\sigma_s = \frac{\rho \cdot M \sqrt{R \cdot ln(\frac{1}{\delta})}}{\epsilon}$ where $M$ is the $l_2$-norm clipping value for $||x||$?

visitworld123 commented 7 months ago

Thanks for your attention to our work. For your question, you can refer to pages 21 and 22, the end of the Proof of Lemma D.4 where we re-scale data into [0, 1].

MaxH1996 commented 7 months ago

Thanks for your quick reply. I have read these pages and I guess that's where my question comes from. You say you assume that the norm does not exceed some value $M$ i.e. you clip the norm, and that $\sigma_s$ is proportional to $\rho M$. So $M$ would be the "equivalent" to $c$ in the DP-SGD formulation for all features, and $\rho M$ for performance sensitive features. So to me it is still unclear why, given the analogy you draw to DP-SGD, $M$ does not appear in the equation for $\sigma_s$.

Apologies if I am just completely missing something obvious here.

yuzhengcuhk commented 7 months ago

Hi, Very glad to know that you are interested into our work. Thank you for spending your valuable time to read our work in details. Sorry for not addressing your questions in the previous reply :)

Given my understanding to your question, your overall analysis is correct.

Hope this response could help the confusion. Thanks for your sharing! If you have further comments on this question, welcome to contact us/me.

No worry about anything or apology. It is our pleasure and responsibility to solve potential confusion from different readers or various perspectives.

Best wishes to you and your research!

Yu

MaxH1996 commented 7 months ago

Thank you very much for your detailed and kind reply!

I would have some follow ups though regarding the norm of $x$. My understanding is that you bound the familiar $l_2$-norm of the input image, defined by $||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_N^2}$. It was also my initial assumption that in order for the equation in Lemma D.2 to be applied, the norm would have to be bounded by 1, as you described.

However, when I looked at the code, and perhaps I missed it, I do not see where you impose the constraint, i.e. clip the norm. In addition, I tried to look at the distilled features $x_s$, by basically saving the shared features obtained from here. When taking the norm of these features (r_x or g_x as returned by the vae without noise) they are in the range of approximately 2-6, which does not align with the norm being clipped. Perhaps I extracted the features from an incorrect place in the code, and there is some post-processing that is carried out which I didn't catch.

MaxH1996 commented 6 months ago

Any thoughts regarding the previous post? :)