Open wangguanan opened 10 months ago
@pavel-izmailov
One more thing, using multiple processings can significantly increase data loading speed, i.e. reduce training and inference time, which can be implemented by setting num_workers > 0
, correspondingly:
def get_imagenet(datapath, split, batch_size, shuffle, transform=TRANSFORM):
ds = torchvision.datasets.ImageNet(root=datapath, split=split, transform=transform)
loader = torch.utils.data.DataLoader(ds, shuffle=shuffle, batch_size=batch_size, num_workers=min(batch_size//16, 8)) # <-- add num_workers=min(batch_size//16, 8)
return ds, loader
train_loader = torch.utils.data.DataLoader(train_ds, shuffle=True, batch_size=batch_size, num_workers=min(batch_size//16, 8)) # <-- add num_workers=min(batch_size//16, 8)
that's great, feel free to make PRs!
Thanks to the OpenAI and Superalignment Generalization Team's awesome work.
When I reading the code of vision part, I found a minor bug about
CosineAnnealingLR
. Since the learning rate schedule is set by n_epochs not n_iters,the
schedule.step()
should be called outside train_loader loop, corespondingly:After fixing the logic, the final results should be like this: