Closed shanjin2014 closed 1 month ago
It is not quite common for Opacus to deal with such complex operations. A qq: what is the reason of using Opacus here, given you do not need access to per_sample_gradient? How about directly adding noise to the aggregated gradient?
🐛 Bug
I am trying to replicate the idea in one paper: First only set the last fc layer in the model to be requires_grad = true and other layers are false. And then after getting the loss, using loss.backward() to get the gradients (aggregated and noise added) only for the fc layers. Then, based on those gradients, by building linear equations, to estimate the graidents of loss wrt logits, and finally using the estimated gradients to update the whole model.
I have tested the same idea in pytroch, without noise added on fc layer but solving the linear equations, and it works. But by using opacus to add noise on the fc layer, I got the error. Is this a bug or other issue? is this caused by multiple backward?
The error is caused by the line: logits. backward (dLdZ)
Please reproduce using our template Colab and post here the link
To Reproduce
inputs, labels = data optimizer. zero_grad()
logits, inoutput = model (inputs) # inoutput is the intermediate output before the last layer of the model loss = criterion (logits, labels)
loss. backward(retain_graph=True)
fc_params = model.fc.parameters( ) grads_fc =[]
for param in fc.params: if param.grad is not None: if len (param.grad.shape) > 1: fc_grad = param.grad.view(param.grad.size(0), -1) else: fc_grad = param. grad. unsqueeze (1) grads_fc. append (fc_grad)
dLdW = torch. cat (grads_fc, dim=1) dZdB = torch.ones ( (inoutput.size(0), 1)).to(device) dZdW = torch.cat ( (inoutput, dZdB), dim=1)
A = dZdW. t() B = dLdW.t() dLdZ = torch. linalg.lstsq(A, B) .solution
logits. backward (dLdZ) optimizer.step ( )
Environment
Please copy and paste the output from our environment collection script (or fill out the checklist below manually).
You can get the script and run it with:
conda
,pip
, source):Additional context