Model doesn't train even when ModuleValidator.validate yields no errors

sarakodeiri commented 1 month ago

🐛 Bug

We're trying to privately fine-tune a ViT B/16 model (link) with CIFAR-10 data. The non-private version uses MultiHeadAttention which is not compatible with DP. This compatibility issue is fixed when we use ModuleValidator.fix and it changes to DPMultiHeadAttention. Also, the ModuleValidator.validate function yields no errors. However, the model fails to train and throws the following error: [NotImplementedError("Model contains a trainable layer with buffers that Opacus doesn't currently support

To fix this, I referred to a previous issue #454 and changed the hook style to "ew" for Expanded Weights. The model, optimizer, and train_loader are created with no errors, but in the training loop, another error shows up: RuntimeError: Expanded Weights encountered but cannot handle function view

I don't know how to proceed from here. Any help is appreciated. Thank you!

To Reproduce

Colab link: Colab

Steps to reproduce the behavior:

Run Colab file
Attempt to make a privacy engine with hook_style as "hooks"
Change hook style to "ew" and run the training loop

Expected behavior

I expect the ViT/B16 model to be ready to train, especially after ModuleValidator.validate doesn't show any errors with the architecture and its modules.

Environment

PyTorch Version (e.g., 1.0): 2.2.1
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source): Not compiling from source.
Python version: 3.12.2
CUDA/cuDNN version: N/A
GPU models and configuration: Didn't use GPUs yet, but will use A50/A100 once it gets resolved.
Any other relevant information: N/A

EnayatUllah commented 1 month ago

Hi. Thanks for raising this! We are currently working on fixing these incompatibility issues when using Expanded Weights. So, I would suggest you to try the hooks mode -- identify which part of the model uses buffers and try to replace it with similar non-buffered modules.

sarakodeiri commented 1 month ago

Hi, thank you so much for answering!

How do you think I could identify the parts that use buffers? There's a CLIP wrapper around the model, and Opacus's error message returns the entire module as a problem like this: [NotImplementedError("Model contains a trainable layer with buffersthat Opacus doesn't currently support(:CLIP(\n (visual): VisionTransformer(\n (conv1): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16), bias=False)\n (patch_dropout): Identity()\n (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (transformer): Transformer(\n (resblocks): ModuleList(\n (0-11): 12 x ResidualAttentionBlock(\n (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (attn): DPMultiheadAttention(\n (qlinear): Linear(in_features=768, out_features=768, bias=True)\n (klinear): Linear(in_features=768, out_features=768, bias=True)\n (vlinear): Linear(in_features=768, out_features=768, bias=True)\n (out_proj): Linear(in_features=768, out_features=768, bias=True)\n (dropout): Dropout(p=0.0, inplace=False)\n )\n (ls_1): Identity()\n (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n (mlp): Sequential(\n (c_fc): Linear(in_features=768, out_features=3072, bias=True)\n (gelu): QuickGELU()\n (c_proj): Linear(in_features=3072, out_features=768, bias=True)\n )\n (ls_2): Identity()\n )\n )\n )\n (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n )\n (transformer): Transformer(\n (resblocks): ModuleList(\n (0-11):... (I'm no expert, but a more specific error message would be useful.)
Say I correctly identify the problematic parts. What is so great about ModuleValidator.fix is that it changes the modules and moves the weights accordingly. Is there a best practice or tested solution for manually moving the weights?
Maybe too broad of a question: Is there any documentation/report specifying what's incompatible with Opacus, since ModuleValidator.validate doesn't seem to cover everything?
An even broader question, for my curiosity only: Can all non-private models be made private with Opacus? Or have there been cases where models can't be made private?

Thanks again!

HuanyuZhang commented 1 month ago

For now, we rely on both model ModuleValidator and GradSampleModule.validate() to check the compatibility. For the latter, under the strict mode, GSM will throw an error when the module includes a buffer (https://github.com/pytorch/opacus/blob/main/opacus/grad_sample/grad_sample_module.py#L108). The error can be muted by setting strict = False.

HuanyuZhang commented 1 month ago

If you do not wrap the whole model by "clip", the validator should be able to tell you which submodule includes a buffer.
Do not fully understand your question. Do you mind explaining more?
Thanks for your suggestion. Indeed we have a plan to update the documentation to improve clarity.
Let us separate this question into two parts:
- Can any non-private model get trained by DP-SGD? The answer is no. For some modules having buffers which will reveal private information (like BatchNorm), or clip-style losses (non-linear across samples in the mini-batch), I am not aware of how to train it by DP-SGD.
- Can any DP-SGD model be trained by Opacus? The answer is yes, but potentially with minor code tweaks. One common issue is that Opacus requires the batch size dimension to be consistent across each submodule. But in some custom modules, this assumption is violated due to the permutation operation which will lead to some gradient mis-match error (e.g., https://github.com/pytorch/opacus/issues/666).

pytorch / opacus