To provide real DP guarantees, we need to certify that batches are also shuffled with a CSPRNG. This means supporting passing a torchcsprng generator to the DataLoader. If you try to do that, it will try calling randperm and die.
To repro:
import torchvision
import torchvision.transforms as tfms
train_ds = torchvision.datasets.CIFAR10('.', train=True, download=True, transform=tfms.ToTensor())
from torch.utils.data import DataLoader
train_dl = DataLoader(train_ds, batch_size=8, shuffle=True)
import torchcsprng as prng
generator = prng.create_random_device_generator("/dev/urandom")
train_dl = DataLoader(train_ds, batch_size=8, shuffle=True, generator=generator)
x, y = next(iter(train_dl))
Error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-18-22755f335b09> in <module>()
----> 1 x, y = next(iter(train_dl))
2 x.shape
4 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py in __iter__(self)
108 rand_tensor = torch.randint(high=n, size=(self.num_samples,), dtype=torch.int64, generator=self.generator)
109 return iter(rand_tensor.tolist())
--> 110 return iter(torch.randperm(n, generator=self.generator).tolist())
111
112 def __len__(self):
RuntimeError: Could not run 'aten::randperm.generator_out' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. 'aten::randperm.generator_out' is only available for these backends: [CPU, CUDA, Autograd, Profiler, Tracer].
To provide real DP guarantees, we need to certify that batches are also shuffled with a CSPRNG. This means supporting passing a torchcsprng generator to the DataLoader. If you try to do that, it will try calling
randperm
and die.To repro:
Error message: