Open anirban-nath opened 1 year ago
Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass grad_sample_mode="functorch"
to make_private()
, which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).
Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass
grad_sample_mode="functorch"
tomake_private()
, which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).
Hi. I was using the make_private_with_epsilon
function and I tried "functorch" but it did not work.
It should also work with make_private_with_epsilon
. Do you still have the same error message?
It should also work with
make_private_with_epsilon
. Do you still have the same error message?
Exact same error message. No difference. I tried with both make_private
and make_private_with_epsilon
. I even tried replacing that LayerNorm with a GroupNorm but none of these have made any difference.
I have a particular LayerNorm function in my code because of which I am not able to successfully run Opacus in my code. This LayerNorm function function is defined just like 3 - 4 others in my code and is used in 2 places. When I execute loss.backward(), the grad of the layer function is populated but per_sample grad isn't, which leads Opacus to throw the error "Per sample gradient is not initialized. Not updated in backward pass?"
Under what circumstances is this possible?
PS: This is how the norm is defined
decoder_norm = nn.LayerNorm(d_model) self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm, return_intermediate=return_intermediate_dec)
This is how it is used. The usages are shown with comments beside them
`class TransformerDecoder(nn.Module):