Closed telegraphroad closed 7 months ago
I think the problem is associated with "prv" accounting, and this part is independent of the model size. Could you try explicitly setting (accountant = "rdp") when initializing PrivacyEngine?
I think the problem is associated with "prv" accounting, and this part is independent of the model size. Could you try explicitly setting (accountant = "rdp") when initializing PrivacyEngine?
Thanks for the response @HuanyuZhang. Changing the accountant to rdp fixed the previous issue but now it tells me
Per sample gradient is not initialized. Not updated in backward pass?
. I'm running the validator and fix functions on the model and I get no errors on validation. Isn't fix supposed to add the grad sampler to the model or does it do it only for certain "standard" types of layers and modules?:
from opacus.validators import ModuleValidator
model = NormalizingFlowMNist(num_coupling=2, num_final_coupling=2, planes=2).to(device)
model = ModuleValidator.fix(model)
errors = ModuleValidator.validate(model, strict=True)
print(errors[-5:])
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.2)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
DELTA = 0.9/len(train_loader.dataset)
privacy_engine = PrivacyEngine(accountant = "rdp")
model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
module=model,
optimizer=optimizer,
data_loader=train_loader,
epochs=epochs,
target_epsilon=target_epsilon,
target_delta=DELTA,
max_grad_norm=MAX_GRAD_NORM,
)
Could you kindly let me know which line triggers this error? I previously thought this error only occurred when the model is updated by an optimizer.
@HuanyuZhang Previously, it happened when I ran the make_private_with_epsilon
, before the training loop started. After changing the accountant to rdp the problem with make_private_with_epsilon
was fixed, but now during training I get the error above. Now it happens on optimizer.step()
:
[<ipython-input-4-d2edbffee1f2>](https://localhost:8080/#) in train_loop(dataloader, model, loss_fn, optimizer, batch_size, report_iters, num_pixels)
580
581 #prev = [(name, x, x.grad) for name, x in model.named_parameters(recurse=True)]
--> 582 optimizer.step()
583
584 if batch % report_iters == 0:
[/content/opacus/opacus/optimizers/optimizer.py](https://localhost:8080/#) in step(self, closure)
511 closure()
512
--> 513 if self.pre_step():
514 return self.original_optimizer.step()
515 else:
[/content/opacus/opacus/optimizers/optimizer.py](https://localhost:8080/#) in pre_step(self, closure)
492 returns the loss. Optional for most optimizers.
493 """
--> 494 self.clip_and_accumulate()
495 if self._check_skip_next_step():
496 self._is_last_step_skipped = True
[/content/opacus/opacus/optimizers/optimizer.py](https://localhost:8080/#) in clip_and_accumulate(self)
395 """
396
--> 397 if len(self.grad_samples[0]) == 0:
398 # Empty batch
399 per_sample_clip_factor = torch.zeros((0,))
[/content/opacus/opacus/optimizers/optimizer.py](https://localhost:8080/#) in grad_samples(self)
343 ret = []
344 for p in self.params:
--> 345 ret.append(self._get_flat_grad_sample(p))
346 return ret
347
[/content/opacus/opacus/optimizers/optimizer.py](https://localhost:8080/#) in _get_flat_grad_sample(self, p)
280 )
281 if p.grad_sample is None:
--> 282 raise ValueError(
283 "Per sample gradient is not initialized. Not updated in backward pass?"
284 )
ValueError: Per sample gradient is not initialized. Not updated in backward pass?
Thanks. Could you also provide more code, especially on how you do the backward propogation?
Did not get reply so I close the issue. Feel free to re-open it if needed.
I have also encountered this problem. May I ask if this problem has been resolved. This problem occurs when using slightly larger models, and even adding "rdp" does not work
🐛 Bug
The privacy engine
make_private_with_epsilon
function leads to memory error even for a very small network. You can find the full code in this colab. I have tested it on a GPU with 12gb of RAM and it still gives me an out of memory with a few warnings as:On a A40 with 48gb or memory it gave me the following error:
Traceback (most recent call last): File "./normalizing-flows/examples/rnvp_fmnist.py", line 791, in <module> model, optimizer, train_loader = privacy_engine.make_private_with_epsilon( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/privacy_engine.py", line 517, in make_private_with_epsilon noise_multiplier=get_noise_multiplier( ^^^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/utils.py", line 70, in get_noise_multiplier eps = accountant.get_epsilon(delta=target_delta, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/prv.py", line 97, in get_epsilon dprv = self._get_dprv(eps_error=eps_error, delta_error=delta_error) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/prv.py", line 126, in _get_dprv return compose_heterogeneous( ^^^^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/analysis/prv/compose.py", line 58, in compose_heterogeneous dprvs = [ ^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/analysis/prv/compose.py", line 59, in <listcomp> _compose_fourier(dprv, num_self_composition) File "/miniconda3/envs/nflt/lib/python3.11/site-packages/opacus/accountants/analysis/prv/compose.py", line 14, in _compose_fourier composed_pmf = irfft(rfft(dprv.pmf) ** num_self_composition) ^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/scipy/fft/_backend.py", line 25, in __ua_function__ return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/miniconda3/envs/nflt/lib/python3.11/site-packages/scipy/fft/_pocketfft/basic.py", line 62, in r2c return pfft.r2c(tmp, (axis,), forward, norm, None, workers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MemoryError: std::bad_alloc
Please reproduce using our template Colab and post here the link
To Reproduce
Expected behavior
Environment
Please copy and paste the output from our environment collection script (or fill out the checklist below manually).