No speedup when training ImageNet

giangnguyen2412 commented 1 year ago

Hi ffcv team,

This is an awesome work! I tried to use ffvc in my currect project but I got 0% speed up. I create beton files like this:

from torchvision import datasets
dataset = datasets.ImageFolder('/home/train/')

from ffcv.writer import DatasetWriter
from ffcv.fields import RGBImageField, IntField
writer = DatasetWriter(f'ffcv_output/imagenet_train.beton', {
    'image': RGBImageField(write_mode='jpg',
                           max_resolution=400,
                           compress_probability=0.5,
                           jpeg_quality=90),
    'label': IntField(),
},
    num_workers=16)
writer.from_indexed_dataset(dataset)

and then I make the DataLoader per suggested:

data_loader = Loader('ffcv_output/imagenet_{}.beton'.format(phase),
                     batch_size=RunningParams.batch_size,
                     num_workers=8,
                     order=OrderOption.RANDOM,
                     os_cache=True,
                     drop_last=True,
                     pipelines={'image': [
                      RandomResizedCropRGBImageDecoder((224, 224)),
                      RandomHorizontalFlip(),
                      ToTensor(),
                      ToDevice(torch.device('cuda:0'), non_blocking=True),
                      ToTorchImage(),
                      # Standard torchvision transforms still work!
                      NormalizeImage(IMAGENET_MEAN, IMAGENET_STD, np.float32)
                     ], 'label':
                     [
                        IntDecoder(),
                        ToTensor(),
                        Squeeze(),
                        ToDevice(torch.device('cuda:0'), non_blocking=True),
                            ]}
                     )

However, I got no speedup. The speed is the same when I used the original torch.DataLoader . Do I need to use DDP to see the speedup? I am using DataParallel only. Any suggestions?

giangnguyen2412 commented 1 year ago

@lengstrom @GuillaumeLeclerc @NicolasHug any help?

andrewilyas commented 1 year ago

@giangnguyen2412 Hi! Using DataParallel is likely to incur significant slowdown and be the bottleneck---check out the ffcv-imagenet example https://github.com/libffcv/ffcv-imagenet for an example of how to use FFCV with distributed training! Let us know if this does not answer your question.

libffcv / ffcv

No speedup when training ImageNet #238