Closed nhianK closed 1 month ago
Hi, for Opacus to work, we need the input of each module to be consistent in having "batch_size" as the first or second dimension (by default it is the first, link). However, in your code, after you permute in the forward pass, Opacus gets confused in understanding which dimension is for batch_size, and this is why you see the per-sample-grad dimension gets messed up.
Close the issue due to no response. Feel free to re-open if the question is unresolved.
I am trying to train a model with opacus. Link to the original model: https://github.com/jdxyw/deepKT/blob/master/deepkt/model/saint.py I replaced MultiheadAttention with DPMultiheadAttention. This issue was brought up before in #505. I printed the dimensions of the per sample norms(in the last cell). However, I could not determine the exact issue. Here is a link to reproduce the error: https://colab.research.google.com/drive/11wf7tEUOOlWHoMcw2jP2Zlf6ooPGoGbx?usp=sharing.
traceback: