The default random translate operation appears to be using uninitialized memory. This results in nondeterministic behavior for e.g. CIFAR-10 trainings when the GPU is also in use by another process. This PR fixes this.
The behavior can be reproduced with the following script:
import os
from tqdm import tqdm
import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision
import torchvision.transforms as T
from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
from ffcv.loader import Loader, OrderOption
from ffcv.pipeline.operation import Operation
from ffcv.transforms import RandomTranslate, Convert, ToDevice, ToTensor, ToTorchImage
from ffcv.transforms.common import Squeeze
CIFAR_MEAN = [125.307, 122.961, 113.8575]
CIFAR_STD = [51.5865, 50.847, 51.255]
denormalize = T.Normalize(-np.array(CIFAR_MEAN)/np.array(CIFAR_STD), 1/np.array(CIFAR_STD))
label_pipeline = [IntDecoder(), ToTensor(), ToDevice('cuda:0'), Squeeze()]
image_pipeline = [
SimpleRGBImageDecoder(),
RandomTranslate(padding=4),
ToTensor(),
ToDevice('cuda:0', non_blocking=True),
ToTorchImage(),
Convert(torch.float16),
T.Normalize(CIFAR_MEAN, CIFAR_STD),
]
loader = Loader(f'/tmp/cifar_train.beton',
batch_size=512,
num_workers=8,
order=OrderOption.RANDOM,
drop_last=True,
pipelines={'image': image_pipeline,
'label': label_pipeline})
for _ in range(2):
imgs_t = []
for inputs, _ in tqdm(loader):
img_t = inputs.float()
imgs_t.append(img_t.clone())
img_t = denormalize(imgs_t[0][:8].cpu())
img_t1 = torchvision.utils.make_grid(img_t, nrow=4) / 255
plt.figure(figsize=(20, 20))
plt.imshow(img_t1.permute(1, 2, 0).cpu().numpy())
plt.show()
The default random translate operation appears to be using uninitialized memory. This results in nondeterministic behavior for e.g. CIFAR-10 trainings when the GPU is also in use by another process. This PR fixes this.
The behavior can be reproduced with the following script: