tensorflow / privacy

Library for training machine learning models with privacy for training data
Apache License 2.0
1.95k stars 452 forks source link

Hide microbatches from tutorials to avoid accidental non-DP models #81

Closed ahonkela closed 4 years ago

ahonkela commented 5 years ago

I am looking to use your examples for teaching and I was struck by how poorly documented the microbatches feature is in the tutorials.

If I understand correctly, having microbatches < batch_size implies that the result will not satisfy DP but only some different related privacy criterion. This is not at all obvious from the documentation which can leave the impression that microbatches are just some technical parameter. This may also cause someone to accidentally create non-DP models by changing batch_size while forgetting to change microbatches correspondingly.

I would suggest eliminating microbatches from all tutorials (just defaulting to None) and making it clear in the documentation what the implications of changing it are for the DP guarantees.

npapernot commented 4 years ago

Thank you for the suggestion. Having a smaller number of microbatches than the size of the batch does not impact the privacy guarantee, it simply makes the computation faster (at the expense of a potential decrease in accuracy because the gradient of multiple training examples are clipped together rather than each example being clipped individually). We'd like to maintain the microbatch parameter exposed so users are aware of the potential tradeoff between computational time and accuracy.

maw501 commented 4 years ago

Forgive my ignorance, but if the number of microbatches doesn't affect the privacy guaranteee - why don't we just clip at the batch level and do away with the notion of a microbatch?

npapernot commented 4 years ago

It would likely have a negative impact on performance (if you clip each example individually, it is less likely that you are losing signal from them than if you clip the average gradient).

maw501 commented 4 years ago

Thanks for the reply.

Clipping at the batch averaged gradient level is just how normal gradient clipping works, no? For DP I guess the noise becomes more disruptive to the gradients as we increase microbatch size as less independent noises are averaged over?

That said, I'm still struggling to understand that this has zero impact on epsilon...any further explanation/intuition here would be great, or pointing me to a resource.

Also, I assume TF privacy follows the original paper (Deep Learning with Differential Privacy) and clips, noises then averages (in that order)?

NARUTORIO commented 4 years ago

I think the result of having FLAGS.microbatches < FLAGS.batch_size is not just "at the expense of a potential decrease in accuracy because the gradient of multiple training examples are clipped together rather than each example being clipped individually"-npapernot.

I test the parameters (on MNIST): Learning rate | Noise multiplier | Clipping threshold | Number of microbatches | Number of epochs | batch_size 0.15 | 1.1 | 1.0 | 256 | 60 | 256 and I can get the similar test accuracy as reported in the tutorial.

However, if I change the Number of microbatches to be 1, and keep all others unchanged, the test accuracy is around 0.1 after training, which means the performance is totally destroyed (obviously it's not only because of the difference between clipping individual and clipping an average of a small group).

The true reason causing the performance destroy is the larger noise scale corresponding to the gradient scale. When you use microbatches=256 (batch_size=256), you are adding noise with std=sigma to the sum of 256 gradients each with norm bounded by clip_bound C. However, if you use microbatches=1 (batch_size=256), you are adding noise with the same std=sigma to the sum of only 1 gradient (averaged of 256) with norm also bounded by clip_bound C.

So, my question is how to change the noise_multiplier correspondingly when we change the number of microbatches in order to get an acceptable test accuracy?