Closed key2miao closed 1 year ago
The problem of OOM occurs because the Opacus library stores the gradients of all samples. When the client data set is large, the memory usage is huge. This problem can be solved by specifying --serial and --serial_bs parameters. These two parameters will physically specify a virtual batch size, and the corresponding training time will be longer, but logically will not affect the training and the addition of DP noise. The main reason for this is to not violate the theory of DP noise addition. Simply put, it is a time-for-space solution.
The principle of this method is to divide the large batch into small virtual batches when using the Full Dataset as a batch, and backpropagate them separately. Finally, gradient descent is performed.
for example:
python main.py --dataset mnist --model cnn --dp_mechanism Gaussian --serial --serial_bs 128
The larger the parameter --serial_bs, the more memory it takes up, and the less training time; the smaller --serial_bs, the closer the algorithm is to serial, the less memory it takes, and the longer the training time.
Unfortunately, multi-GPU training is not yet supported. Thanks for the suggestion, I'll consider adding this feature soon.
See: [1] https://github.com/wenzhu23333/Differential-Privacy-Based-Federated-Learning#remark
Thank you very much. Could you please explain me about what the dp_delta and dp_epsilon represents? How to observe the the performance of DP in the model performance and privacy preservation?
dp_delta and dp_epsilon is the privacy budget for Differetial Privacy. Privacy budget determines the amount of noise. You can refer to some papers related to differential privacy. For example, the paper mentioned in the Readme.
Hi, thank you very for your great work. When I try to reproduce the results on cifar and shakespeare, I always met the out of memory issue. Could you please tell me how to solve this problem? In addition, does it support multi GPUs? Thank you very much!