pytorch / opacus

Training PyTorch models with differential privacy
https://opacus.ai
Apache License 2.0
1.67k stars 332 forks source link

Microbatching Support #655

Open shs037 opened 1 month ago

shs037 commented 1 month ago

🚀 Feature

Support microbatch size > 1, i.e., clipping multiple (instead of one) gradients.

Motivation

We want to experiment with microbatch size > 1 for some training tasks.

(I understand that microbatch size > 1 may not improve memory / computation efficiency. This ask is more about algorithm / utility.)

Pitch

A num_microbatches parameter in make_private, similar to tf privacy.

HuanyuZhang commented 1 month ago

Thanks @shs037 for bringing this to the table! We currently do not have any plan to support this function, considering its limited use case inside Meta. However, I am happy to provide/discuss about the implementation if you want to contribute a PR. One quick idea is to make changes in the optimizer function. Instead of clipping, we average first then clip.

shs037 commented 1 month ago

Thanks a lot! Is it basically like changing a few lines in the function you linked?

HuanyuZhang commented 1 month ago

Yeah, I think a hacky solution (without a very careful interface design) should require minimal changes. "self.grad_samples" (per_sample_gradient) is a tensor with shape = batch_size* #parameters. You just need to divide it into several microbatches, and take average for each microbatch. Perhaps you will also need to change "scale_grad" (https://github.com/pytorch/opacus/blob/main/opacus/optimizers/optimizer.py#L441) to make sure of the correctness of scale.

HuanyuZhang commented 1 month ago

This approach might be problematic if you have multiple mini-batches between the two optimizer steps. But I believe it is a very rare situation.