Question about $\Delta \mathcal{S}_{\mathcal{D}}$ in the paper.

wangyanmeng / FedTAN

Pytorch implementation of FedTAN (federated learning algorithm tailored for batch normalization) proposed in the paper, Why Batch Normalization Damage Federated Learning on Non-IID Data.

MIT License

6 stars 1 forks source link

HI. Recently, i found your paper in arXiv and i am very interested in your solid work. However, i found one question that kept confusing me and i will be very grateful if you can explain it.

It about the $\Delta \mathcal{S}{\mathcal{D}}$ in the paper. In the paper, you said $\mathcal{S}$ contains the running statistics (batch mean/var) in the BN layer. But HOW can you compute the derivatives of both running mean/var? They are not learnable parameters in the traditional BN layer. And i don't think $\nabla{\mathbf{w}} F$ is correlated with running mean/var. Cause again these two parameters are not learnable. They can only be changed in the forward propagation stage. Do you change the traditional BN layer to make them learnable? or i don't see any reason to include the discussion of $\Delta \mathcal{S}{\mathcal{D}}$._

I will be very appreciated if you can reply to my question.

wangyanmeng / FedTAN

Question about $\Delta \mathcal{S}_{\mathcal{D}}$ in the paper. #1