Closed Simha55 closed 2 years ago
Thanks for reporting the error, can you please copy/paste the error message? I don't see anything on the linked Colab.
Thanks for your reply. I changed the code. I guess now you should see the error. You can run again to reproduce it
I think I found the issue: the self.dense_agsx is initialized but never used in the forward pass, thus its p.grad_sample
is None.
I think we should allow that but raise a warning that some parameter is not used, what do you think @ffuuugor @karthikprasad ?
@Simha55 let me know if that solves the issue (and you can close it if it works now)
Thanks alot for your help. Right now it is working
Hi @alexandresablayrolles sorry for reopening the issue. I just had a small doubt although I read the documentation. I didn't get a clear understanding between batch size and maximum physical bath size could you please elaborate the difference.
Yes. Sometimes you need large batch sizes to make your model converge, but the GPU memory might be too small to fit all the per-sample gradients. This is why we distinguish the two: the physical batch size is what your GPU can fit, but the actual batch size is what you need for your optimization. Typically, if you have batch size of 512 and physical batch size of 32, you will do forward/backward on physical batches of size 32, but optimizer.step() will do an actual step only once every 16 (=512/32) forward/backward.
Got it. Thanks
Hello, I am facing this same error and I some have layers that I know I am actively using but my grad_sample for those layers shows up as none. Can you post a link to the Colab notebook that has the solution implemented? @Simha55 @alexandresablayrolles
Hello @anirban-nath, I made few changes to that notebbok hence it is not as previous version. But ill tell you the error. I have decalred self.dense_agsx = nn.Linear(2, 10) in init function of my model class but I havent used it anywhere in my forward function of that class and I got the error because of that.
Hi, @Simha55 . I have the same error as yours, except that I am actually using the layer in my forward function. But I am still getting that error. I cannot reproduce it since my project has a very big model and lots of files, so I am at a loss as to why this is happening. I have several Linear layers and norms in my code but it's just this one LayerNorm and a Linear Layer whose grad_samples show up as None.
I am getting this error Per sample gradient is not initialized. Not updated in backward pass? Can anyone post a link to the Colab notebook that has the solution implemented?
🐛 Bug
I am getting this error at Optimizer.step() while using DP. I am currently using Opacus v1.2 and Pytorch version is 1.12.1+cu102
Please reproduce using our template Colab and post here the link
To Reproduce
1. 2. 3.
Expected behavior
Environment
Please copy and paste the output from our environment collection script (or fill out the checklist below manually). Out put after running the script: Collecting environment information... PyTorch version: 1.12.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.27
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-4.15.0-191-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB GPU 4: Tesla V100-SXM2-32GB GPU 5: Tesla V100-SXM2-32GB GPU 6: Tesla V100-SXM2-32GB GPU 7: Tesla V100-SXM2-32GB
Nvidia driver version: 515.65.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] functorch==0.2.1 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [pip3] pytorch-ignite==0.4.9 [pip3] torch==1.12.1 [pip3] torchaudio==0.11.0 [pip3] torchmetrics==0.9.3 [pip3] torchsummary==1.5.1 [pip3] torchtuples==0.2.2 [pip3] torchvision==0.12.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] functorch 0.2.1 pypi_0 pypi [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.21.5 py39he7a7128_1
[conda] numpy-base 1.21.5 py39hf524024_1
[conda] numpydoc 1.2 pyhd3eb1b0_0
[conda] pytorch-ignite 0.4.9 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] torch 1.12.1 pypi_0 pypi [conda] torchaudio 0.11.0 py39_cu113 pytorch [conda] torchmetrics 0.9.3 pypi_0 pypi [conda] torchsummary 1.5.1 pypi_0 pypi [conda] torchtuples 0.2.2 pypi_0 pypi [conda] torchvision 0.12.0 py39_cu113 pytorch
You can get the script and run it with:
conda
,pip
, source): pipAdditional context