Closed long21wt closed 2 years ago
The issue comes directly at the privacy_engine.make_private_with_epsilon()
step
Hello,
Thank you for flagging this!
Could you please also provide a reproducible code using our template? (I want to know what the stack of error is that you are seeing)
Thanks, it seems like either problems with dependencies or Python version. On Colab with Python 3.7.13, privacy engine works well. I will try with different environment setup on my machine.
Hi, On Colab, the privacy engine works well. https://colab.research.google.com/drive/1TOELSJQ7OyOGc55o32JZz57rCwPHGubo But I can not run it on my cluster machine, the same error. Here is the detailed log:
08/14/2022 12:08:00:WARNING:Reusing dataset wmt19 (/storage/ukp/work/vu/.cache/huggingface/datasets/wmt19/cs-en/1.0.0/aeadcbe9f1cbf9969e603239d33d3e43670cf250c1158edf74f5f6e74d4f21d0)
Namespace(diff_priv=True, epsilon=1)
100%|██████████| 7271/7271 [38:29<00:00, 3.15ba/s]
0%| | 0/3 [00:00<?, ?ba/s]
33%|███▎ | 1/3 [00:00<00:00, 2.40ba/s]
67%|██████▋ | 2/3 [00:00<00:00, 2.33ba/s]
100%|██████████| 3/3 [00:01<00:00, 2.38ba/s]
100%|██████████| 3/3 [00:01<00:00, 2.37ba/s]
/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/privacy_engine.py:134: UserWarning: Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
"Secure RNG turned off. This is perfectly fine for experimentation as it allows "
/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/accountants/analysis/rdp.py:333: UserWarning: Optimal order is the largest alpha. Please consider expanding the range of alphas to get a tighter privacy bound.
f"Optimal order is the {extreme} alpha. Please consider expanding the range of alphas to get a tighter privacy bound."
With private engine
0%| | 0/908836 [00:00<?, ?it/s]/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py:1053: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py:1018: UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
warnings.warn("Using non-full backward hooks on a Module that does not return a "
Traceback (most recent call last):
File "private_mt.py", line 133, in <module>
optimizer.step()
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/accelerate/optimizer.py", line 140, in step
self.optimizer.step(closure)
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/optimizers/optimizer.py", line 509, in step
if self.pre_step():
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/optimizers/optimizer.py", line 490, in pre_step
self.clip_and_accumulate()
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/optimizers/optimizer.py", line 397, in clip_and_accumulate
g.view(len(g), -1).norm(2, dim=-1) for g in self.grad_samples
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/optimizers/optimizer.py", line 344, in grad_samples
ret.append(_get_flat_grad_sample(p))
File "/storage/ukp/work/vu/miniconda/envs/py37/lib/python3.7/site-packages/opacus/optimizers/optimizer.py", line 200, in _get_flat_grad_sample
"Per sample gradient is not initialized. Not updated in backward pass?"
ValueError: Per sample gradient is not initialized. Not updated in backward pass?
My env information
Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17
Python version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.71.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla V100-PCIE-32GB
Nvidia driver version: 515.48.07
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] numpydoc==1.2
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113
[pip3] torchvision==0.13.1+cu113
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.6 pypi_0 pypi
[conda] numpydoc 1.2 pyhd3eb1b0_0
[conda] torch 1.12.1+cu113 pypi_0 pypi
[conda] torchaudio 0.12.1+cu113 pypi_0 pypi
[conda] torchvision 0.13.1+cu113 pypi_0 pypi
I'll close this issue because conflict between accelerate with opacus
🐛 Bug
Hi, I want to train m2m100 from scratch on wmt19 dataset using opacus privacy engine. However, I encounter this bug. lines of code are just very quick prototype to train the model with huggingface ecosystem.
To Reproduce
Here is my code:
Environment
conda
,pip
, source): pip