Open sarakodeiri opened 1 month ago
Hi. Thanks for raising this! We are currently working on fixing these incompatibility issues when using Expanded Weights. So, I would suggest you to try the hooks mode -- identify which part of the model uses buffers and try to replace it with similar non-buffered modules.
Hi, thank you so much for answering!
How do you think I could identify the parts that use buffers? There's a CLIP wrapper around the model, and Opacus's error message returns the entire module as a problem like this:
[NotImplementedError("Model contains a trainable layer with buffersthat Opacus doesn't currently support(:CLIP(\n (visual): VisionTransformer(\n (conv1): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16), bias=False)\n (patch_dropout): Identity()\n (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (transformer): Transformer(\n (resblocks): ModuleList(\n (0-11): 12 x ResidualAttentionBlock(\n (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (attn): DPMultiheadAttention(\n (qlinear): Linear(in_features=768, out_features=768, bias=True)\n (klinear): Linear(in_features=768, out_features=768, bias=True)\n (vlinear): Linear(in_features=768, out_features=768, bias=True)\n (out_proj): Linear(in_features=768, out_features=768, bias=True)\n (dropout): Dropout(p=0.0, inplace=False)\n )\n (ls_1): Identity()\n (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (mlp): Sequential(\n (c_fc): Linear(in_features=768, out_features=3072, bias=True)\n (gelu): QuickGELU()\n (c_proj): Linear(in_features=3072, out_features=768, bias=True)\n )\n (ls_2): Identity()\n )\n )\n )\n (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n )\n (transformer): Transformer(\n (resblocks): ModuleList(\n (0-11):...
(I'm no expert, but a more specific error message would be useful.)
Say I correctly identify the problematic parts. What is so great about ModuleValidator.fix
is that it changes the modules and moves the weights accordingly. Is there a best practice or tested solution for manually moving the weights?
Maybe too broad of a question: Is there any documentation/report specifying what's incompatible with Opacus, since ModuleValidator.validate
doesn't seem to cover everything?
An even broader question, for my curiosity only: Can all non-private models be made private with Opacus? Or have there been cases where models can't be made private?
Thanks again!
For now, we rely on both model ModuleValidator and GradSampleModule.validate() to check the compatibility. For the latter, under the strict mode, GSM will throw an error when the module includes a buffer (https://github.com/pytorch/opacus/blob/main/opacus/grad_sample/grad_sample_module.py#L108). The error can be muted by setting strict = False.
🐛 Bug
We're trying to privately fine-tune a ViT B/16 model (link) with CIFAR-10 data. The non-private version uses
MultiHeadAttention
which is not compatible with DP. This compatibility issue is fixed when we useModuleValidator.fix
and it changes toDPMultiHeadAttention
. Also, theModuleValidator.validate
function yields no errors. However, the model fails to train and throws the following error:[NotImplementedError("Model contains a trainable layer with buffers that Opacus doesn't currently support
To fix this, I referred to a previous issue #454 and changed the hook style to "ew" for Expanded Weights. The model, optimizer, and train_loader are created with no errors, but in the training loop, another error shows up:
RuntimeError: Expanded Weights encountered but cannot handle function view
I don't know how to proceed from here. Any help is appreciated. Thank you!
To Reproduce
Colab link: Colab
Steps to reproduce the behavior:
Expected behavior
I expect the ViT/B16 model to be ready to train, especially after
ModuleValidator.validate
doesn't show any errors with the architecture and its modules.Environment
conda
,pip
, source):conda