ValueError: Per sample gradient is not initialized. Not updated in backward pass?

Simha55 commented 2 years ago

🐛 Bug

I am getting this error at Optimizer.step() while using DP. I am currently using Opacus v1.2 and Pytorch version is 1.12.1+cu102

Please reproduce using our template Colab and post here the link

To Reproduce

:warning: We cannot help you without you sharing reproducible code. Do not ignore this part :) Steps to reproduce the behavior:

1. 2. 3.

Expected behavior

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually). Out put after running the script: Collecting environment information... PyTorch version: 1.12.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.27

Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-4.15.0-191-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB GPU 4: Tesla V100-SXM2-32GB GPU 5: Tesla V100-SXM2-32GB GPU 6: Tesla V100-SXM2-32GB GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 515.65.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] functorch==0.2.1 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [pip3] pytorch-ignite==0.4.9 [pip3] torch==1.12.1 [pip3] torchaudio==0.11.0 [pip3] torchmetrics==0.9.3 [pip3] torchsummary==1.5.1 [pip3] torchtuples==0.2.2 [pip3] torchvision==0.12.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] functorch 0.2.1 pypi_0 pypi [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.21.5 py39he7a7128_1
[conda] numpy-base 1.21.5 py39hf524024_1
[conda] numpydoc 1.2 pyhd3eb1b0_0
[conda] pytorch-ignite 0.4.9 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] torch 1.12.1 pypi_0 pypi [conda] torchaudio 0.11.0 py39_cu113 pytorch [conda] torchmetrics 0.9.3 pypi_0 pypi [conda] torchsummary 1.5.1 pypi_0 pypi [conda] torchtuples 0.2.2 pypi_0 pypi [conda] torchvision 0.12.0 py39_cu113 pytorch

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0): 1.12.1+cu102
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.9.12
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

alexandresablayrolles commented 2 years ago

Thanks for reporting the error, can you please copy/paste the error message? I don't see anything on the linked Colab.

Simha55 commented 2 years ago

Thanks for your reply. I changed the code. I guess now you should see the error. You can run again to reproduce it

alexandresablayrolles commented 2 years ago

I think I found the issue: the self.dense_agsx is initialized but never used in the forward pass, thus its p.grad_sample is None. I think we should allow that but raise a warning that some parameter is not used, what do you think @ffuuugor @karthikprasad ?

@Simha55 let me know if that solves the issue (and you can close it if it works now)

Simha55 commented 2 years ago

Thanks alot for your help. Right now it is working

Simha55 commented 2 years ago

Hi @alexandresablayrolles sorry for reopening the issue. I just had a small doubt although I read the documentation. I didn't get a clear understanding between batch size and maximum physical bath size could you please elaborate the difference.

alexandresablayrolles commented 2 years ago

Yes. Sometimes you need large batch sizes to make your model converge, but the GPU memory might be too small to fit all the per-sample gradients. This is why we distinguish the two: the physical batch size is what your GPU can fit, but the actual batch size is what you need for your optimization. Typically, if you have batch size of 512 and physical batch size of 32, you will do forward/backward on physical batches of size 32, but optimizer.step() will do an actual step only once every 16 (=512/32) forward/backward.

Simha55 commented 2 years ago

Got it. Thanks

anirban-nath commented 1 year ago

Hello, I am facing this same error and I some have layers that I know I am actively using but my grad_sample for those layers shows up as none. Can you post a link to the Colab notebook that has the solution implemented? @Simha55 @alexandresablayrolles

Simha55 commented 1 year ago

Hello @anirban-nath, I made few changes to that notebbok hence it is not as previous version. But ill tell you the error. I have decalred self.dense_agsx = nn.Linear(2, 10) in init function of my model class but I havent used it anywhere in my forward function of that class and I got the error because of that.

anirban-nath commented 1 year ago

Hi, @Simha55 . I have the same error as yours, except that I am actually using the layer in my forward function. But I am still getting that error. I cannot reproduce it since my project has a very big model and lots of files, so I am at a loss as to why this is happening. I have several Linear layers and norms in my code but it's just this one LayerNorm and a Linear Layer whose grad_samples show up as None.

Hafizamariaiqbal commented 1 year ago

I am getting this error Per sample gradient is not initialized. Not updated in backward pass? Can anyone post a link to the Colab notebook that has the solution implemented?

pytorch / opacus