pytorch / opacus

Training PyTorch models with differential privacy
https://opacus.ai
Apache License 2.0
1.65k stars 328 forks source link

`BatchSplittingSampler` return wrong length #640

Closed lsc64 closed 1 month ago

lsc64 commented 3 months ago

🐛 Bug

BatchSplittingSampler reports the length as

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(len(self.sampler) * (expected_batch_size / self.max_batch_size))

Converting the result simply to int leads to the resulted number of batches being one too low. Instead, we need to ceil the result first:

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(np.ceil(len(self.sampler) * (expected_batch_size / self.max_batch_size)))

Some libraries like pytorch lightning will skip the last batch if this is reported wrong, resulting in no actual step occuring at all.

HuanyuZhang commented 2 months ago

Thanks for contributing to Opacus. Will take a look.

lsc64 commented 2 months ago

Thanks for contributing to Opacus. Will take a look.

Have you had the time to look into this? It seems like a straightfoward fix.

HuanyuZhang commented 1 month ago

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

lsc64 commented 1 month ago

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

Thanks for pointing that out. Bad habit I guess, math.ceil is better. I changed it to math.ceil

s-zanella commented 1 month ago

This patch is still computing incorrectly the expected number of batches. See #516.