GradSampleModuleFastGradientClipping ignores strict and force_functorch params

anhnami commented 1 week ago

Per-sample gradient clipping has recently been reported to be useful for speech processing [1][2][3]. Implementing per-sample gradient clipping is complicated, hence I just want to use opacus to do the job. However, since opacus is privacy-focused, it does not support several layers. Furthermore, it seems we can't turn off "strict" mode in GradSampleModuleFastGradientClipping. It would be nice to support this non-privacy use case.

https://github.com/pytorch/opacus/blob/9eed06a2fc785e94abc05e5eb7ef3ed0a5a5a909/opacus/grad_sample/grad_sample_module_fast_gradient_clipping.py#L113

[1] https://arxiv.org/pdf/2406.02004 [2] https://arxiv.org/pdf/2310.11739 [3] https://arxiv.org/pdf/2408.16204

HuanyuZhang commented 1 week ago

Good catch for the "strict" part, will make a patch to fix it.

Do you mind explaining a bit on "it does not support several layers"? I believe the current implementation supports all the layers which were previously supported by Opacus GradSampleModule.

anhnami commented 1 week ago

It's BatchNorm and customized layers with buffers. I'm hoping the strict option may allow me to use them since my use case is not privacy-related.

HuanyuZhang commented 5 days ago

I see. I think you can unblock the usage by setting strict = False. I have never tested it by myself, so had better do some quick test to make sure the gradient norm is consistent (https://github.com/pytorch/opacus/blob/main/opacus/tests/gradient_accumulation_test.py).

pytorch / opacus

GradSampleModuleFastGradientClipping ignores strict and force_functorch params #673