wenzhu23333 / Differential-Privacy-Based-Federated-Learning

Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )
GNU General Public License v3.0
348 stars 55 forks source link

请教 #17

Closed Wu199 closed 10 months ago

Wu199 commented 10 months ago

您好,非常感谢您愿意开源差分隐私与联邦学习结合相关的代码内容,您的代码对我有不少帮助,但我也有一些问题想请教您,问题如下

  1. 根据16年的deep learning with differential privacy那篇文章来看,如果我没有理解错的话使用MA的方法评估epsilon是不是相比Simple Composition,在相同epsilon的情况下算出的噪声系数要更小,如果这样为什么您在readme中展示的准确度的图像的准确度是反过来的,是因为使用MA方法评估的实验还有其它的超参数与Simple Composition不同吗?
  2. 我看您的博客中计算敏感度的方法与20年Federated Learning With Differential Privacy: Algorithms and Performance Analysis中的结果相似,该文章提供的算法是本地客户端在完成本地训练之后对将要上传至服务器进行聚合的模型参数进行裁剪与加噪,请问对训练后将要上传至服务器进行聚合的模型参数进行裁剪与加噪和在模型训练时对每一轮中的每一批数据的单独梯度进行裁剪加噪(也就是本项目代码实现的算法)相比,是否前者在性能上要更优(因为不需要借助opacus库中单独拆出每个数据的独立梯度从而导致显存爆炸,且对于每一个本地客户端来说加噪的次数要更少,只是在每次上传前加一次噪声而不是对一本地轮中的每批次数据进行加噪),如果对即将用于聚合的模型参数数据裁剪加噪相比本项目代码的方法存在什么问题,或者是我对20年这篇文章这部分的理解是否存在偏差,这是我目前想到的两个问题,如果您能为我解答我会非常感谢。

再次感谢您提供的开源代码。

nekopalaa commented 10 months ago

这个代码不是对梯度进行裁剪然后在完成整个本地训练后再对本地模型参数进行加噪吗?也是加噪一次啊

Wu199 commented 10 months ago

这个代码不是对梯度进行裁剪然后在完成整个本地训练后再对本地模型参数进行加噪吗?也是加噪一次啊

该项目代码中model目录下的Update.py文件第54行代码的循环内容,我理解是 for x,y in dataloder: ........#train clip(per_grad) add_noise(grad) ........ end complete local train update client's local model parameters to server 然后那篇论文的伪代码我理解是 for x,y in dataloder: ........#train end complete local train clip(model_params) add_noise(model_params) update client's local model parameters to server 这样看的话在本地加入噪声的次数应该是不同的吧,前者相当于加了本地数据集n/batch_size次的噪声

nekopalaa commented 10 months ago

我看了一下原代码还真是你说的那样,我自己做了改动,在梯度裁剪之后对模型参数进行了加噪。不过原代码其实是一个意思,本地的一个epoch就是整个数据集等于还是加噪一次,你可以在那个for循环里设置个参数看一下,整个过程其实就循环了一次。

nekopalaa commented 10 months ago

文中的方法其实更像是UDP那篇文章是clip(per_grad) 所有样本最后 add_noise(model_params)

Wu199 commented 10 months ago

文中的方法其实更像是UDP那篇文章是clip(per_grad) 所有样本最后 add_noise(model_params)

确实,我感觉这个想法其实是沿用了deep learning with dp那篇文章,将每一个隐私训练数据作为保护的对象进行加噪,并给出了理论推导,20年那篇则是对客户端与服务器交互的上传数据进行加噪保护,把客户端的模型参数看作保护对象,只不过后者在显存和运算时间上感觉更节省性能,所以想了解下代码作者有没有做过对比,或者哪一种更加主流一些

wenzhu23333 commented 10 months ago

这个代码不是对梯度进行裁剪然后在完成整个本地训练后再对本地模型参数进行加噪吗?也是加噪一次啊

该项目代码中model目录下的Update.py文件第54行代码的循环内容,我理解是 for x,y in dataloder: ........#train clip(per_grad) add_noise(grad) ........ end complete local train update client's local model parameters to server 然后那篇论文的伪代码我理解是 for x,y in dataloder: ........#train end complete local train clip(model_params) add_noise(model_params) update client's local model parameters to server 这样看的话在本地加入噪声的次数应该是不同的吧,前者相当于加了本地数据集n/batch_size次的噪声

这是一种妥协的写法,注意我在初始化dataloader的时候是这样写的: self.idxs_sample = np.random.choice(list(idxs), int(self.args.dp_sample * len(idxs)), replace=False) self.ldr_train = DataLoader(DatasetSplit(dataset, self.idxs_sample), batch_size=len(self.idxs_sample),shuffle=True) 其实本地只会添加一次,这么写的原因在于为了兼容DP-SDG,因为DP-SGD的做法是每次梯度下降都会Sample一定数量的子数据集,而经典的Gaussian和Laplace机制参数self.args.dp_sample=1,所以本质上batch Size就是数据集的大小,这个循环其实只有一次。

wenzhu23333 commented 10 months ago

文中的方法其实更像是UDP那篇文章是clip(per_grad) 所有样本最后 add_noise(model_params)

确实,我感觉这个想法其实是沿用了deep learning with dp那篇文章,将每一个隐私训练数据作为保护的对象进行加噪,并给出了理论推导,20年那篇则是对客户端与服务器交互的上传数据进行加噪保护,把客户端的模型参数看作保护对象,只不过后者在显存和运算时间上感觉更节省性能,所以想了解下代码作者有没有做过对比,或者哪一种更加主流一些

NbaFL那篇文章实际上是对参数进行裁剪和添加噪声的,但是更多的做法还是对梯度进行裁剪和添加噪声,因此本仓库实际上的经典高斯机制是针对梯度进行裁剪,并针对参数(也可以说是梯度,因为噪声方差是由敏感度推导而来)添加噪声。

可以参考以下这几篇文章

[3] Y. Zhou et al., "Optimizing the Numbers of Queries and Replies in Convex Federated Learning with Differential Privacy," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3234599.

[4] Y. Zhou, et al.,"Exploring the Practicality of Differentially Private Federated Learning: A Local Iteration Tuning Approach" in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3325889.

[5] Y. Yang, M. Hu, Y. Zhou, X. Liu and D. Wu, "CSRA: Robust Incentive Mechanism Design for Differentially Private Federated Learning," in IEEE Transactions on Information Forensics and Security, doi: 10.1109/TIFS.2023.3329441.

wenzhu23333 commented 10 months ago

您好,非常感谢您愿意开源差分隐私与联邦学习结合相关的代码内容,您的代码对我有不少帮助,但我也有一些问题想请教您,问题如下

  1. 根据16年的deep learning with differential privacy那篇文章来看,如果我没有理解错的话使用MA的方法评估epsilon是不是相比Simple Composition,在相同epsilon的情况下算出的噪声系数要更小,如果这样为什么您在readme中展示的准确度的图像的准确度是反过来的,是因为使用MA方法评估的实验还有其它的超参数与Simple Composition不同吗?
  2. 我看您的博客中计算敏感度的方法与20年Federated Learning With Differential Privacy: Algorithms and Performance Analysis中的结果相似,该文章提供的算法是本地客户端在完成本地训练之后对将要上传至服务器进行聚合的模型参数进行裁剪与加噪,请问对训练后将要上传至服务器进行聚合的模型参数进行裁剪与加噪和在模型训练时对每一轮中的每一批数据的单独梯度进行裁剪加噪(也就是本项目代码实现的算法)相比,是否前者在性能上要更优(因为不需要借助opacus库中单独拆出每个数据的独立梯度从而导致显存爆炸,且对于每一个本地客户端来说加噪的次数要更少,只是在每次上传前加一次噪声而不是对一本地轮中的每批次数据进行加噪),如果对即将用于聚合的模型参数数据裁剪加噪相比本项目代码的方法存在什么问题,或者是我对20年这篇文章这部分的理解是否存在偏差,这是我目前想到的两个问题,如果您能为我解答我会非常感谢。

再次感谢您提供的开源代码。

1、参考以下回答:

Thank you for your attention to my project, your opinion is right, dp-sgd is indeed better than simple synthesis, but the two are usually difficult to compare together, because the concept of sampling rate exists in dp-sgd algorithm.

The dp-sgd algorithm given in the project is the accuracy of the model running on the premise of a sampling rate of 0.01, while the classic composition uses a full batch strategy without the concept of sampling rate.

Therefore, due to the low sampling rate of dp-sgd, each client obtains very little data in each round of training (since the number of clients is 100, each client has less than 10 training data on average), and these data are highly non-iid, which caused this result.

If you want a higher model accuracy, I believe it will be more effective to increase the sampling rate, reduce the number of clients, increase the number of global training round or reduce the degree of non-iid.

For the implementation details of the specific algorithm of federated learning based on dp-sgd, I refer to the following articles:

Y. Zhou et al., "Optimizing the Numbers of Queries and Replies in Convex Federated Learning with Differential Privacy," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3234599.

Hope my answer can solve your problem

2、该仓库的算法本质上与NbaFL加噪声模式相同,只不过它是对参数Clip和加噪声,该仓库是对梯度Clip,对参数加噪。 只不过写法上有些Trick,可能造成了误解,后续会订正,抱歉。

Wu199 commented 10 months ago

1、参考以下回答:

Thank you for your attention to my project, your opinion is right, dp-sgd is indeed better than simple synthesis, but the two are usually difficult to compare together, because the concept of sampling rate exists in dp-sgd algorithm.

The dp-sgd algorithm given in the project is the accuracy of the model running on the premise of a sampling rate of 0.01, while the classic composition uses a full batch strategy without the concept of sampling rate.

Therefore, due to the low sampling rate of dp-sgd, each client obtains very little data in each round of training (since the number of clients is 100, each client has less than 10 training data on average), and these data are highly non-iid, which caused this result.

If you want a higher model accuracy, I believe it will be more effective to increase the sampling rate, reduce the number of clients, increase the number of global training round or reduce the degree of non-iid.

For the implementation details of the specific algorithm of federated learning based on dp-sgd, I refer to the following articles:

Y. Zhou et al., "Optimizing the Numbers of Queries and Replies in Convex Federated Learning with Differential Privacy," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3234599.

Hope my answer can solve your problem

2、该仓库的算法本质上与NbaFL加噪声模式相同,只不过它是对参数Clip和加噪声,该仓库是对梯度Clip,对参数加噪。 只不过写法上有些Trick,可能造成了误解,后续会订正,抱歉。

非常感谢您认真的解答与分享的相关论文文献,您的回答解答了我的上述问题。我在复现差分与深度学习结合相关的实验时会出现模型任务准确率不够高,低于一些文章中方法的准确率的情况,所以会关注加噪的方法与模型训练准确度的变化。

JingZXian commented 2 weeks ago

1、参考以下回答: Thank you for your attention to my project, your opinion is right, dp-sgd is indeed better than simple synthesis, but the two are usually difficult to compare together, because the concept of sampling rate exists in dp-sgd algorithm. The dp-sgd algorithm given in the project is the accuracy of the model running on the premise of a sampling rate of 0.01, while the classic composition uses a full batch strategy without the concept of sampling rate. Therefore, due to the low sampling rate of dp-sgd, each client obtains very little data in each round of training (since the number of clients is 100, each client has less than 10 training data on average), and these data are highly non-iid, which caused this result. If you want a higher model accuracy, I believe it will be more effective to increase the sampling rate, reduce the number of clients, increase the number of global training round or reduce the degree of non-iid. For the implementation details of the specific algorithm of federated learning based on dp-sgd, I refer to the following articles: Y. Zhou et al., "Optimizing the Numbers of Queries and Replies in Convex Federated Learning with Differential Privacy," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3234599. Hope my answer can solve your problem 2、该仓库的算法本质上与NbaFL加噪声模式相同,只不过它是对参数Clip和加噪声,该仓库是对梯度Clip,对参数加噪。 只不过写法上有些Trick,可能造成了误解,后续会订正,抱歉。

非常感谢您认真的解答与分享的相关论文文献,您的回答解答了我的上述问题。我在复现差分与深度学习结合相关的实验时会出现模型任务准确率不够高,低于一些文章中方法的准确率的情况,所以会关注加噪的方法与模型训练准确度的变化。

大佬你好,请问NbaFL这个是如何复现的,好像那篇论文没有代码,找了很久找不到,我是刚入手的....