libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.84k stars 178 forks source link

RandomCrop? #200

Closed ThibaultCastells closed 2 years ago

ThibaultCastells commented 2 years ago

Hi, I tried ffcv with VGG16 on CIFAR10 (I used the example from the library) but I got an important drop of accuracy (-20%). The main difference, I think, is that before I was using torchvision.transforms.RandomCrop(32, padding=4) but this line doesn't seem to work with ffcv, and I didn't find an equivalent in the ffcv transforms.

Is there a way to fix this issue (ideally using an ffcv optimized version of RandomCrop, but if not possible then being able to use the torchvision version would be nice)?

Here is my previous code, without ffcv (accuracy after 10 epochs: 89%):

def load_data(data_dir, batch_size, num_workers, **kwargs):

    # load training data
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = datasets.CIFAR10(root=data_dir, train=True, download=True,
                                            transform=transform_train)
    train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    testset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform_test)
    val_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_loader, val_loader

And here is the new code I use with ffcv (accuracy after 10 epochs: 71%). Note that I tried various combinations of transforms, I tried without quantization, and I also tried to reproduce the combination in the example, but I never got a better accuracy than 71%:

def load_data(data_dir, batch_size, num_workers, **kwargs):

    # 1. create dataset
    my_datasets = {
        'train': datasets.CIFAR10(data_dir, train=True, download=True),
        'test': datasets.CIFAR10(data_dir, train=False, download=True)
        }

    cache_dir = 'cache/'
    if not os.path.exists(cache_dir):
        os.makedirs(cache_dir)
    paths = {
        'train': os.path.join(cache_dir, 'cifar10_train.beton'),
        'test': os.path.join(cache_dir, 'cifar10_test.beton')
        }

    for (name, ds) in my_datasets.items():
        writer = DatasetWriter(paths[name], {
            'image': RGBImageField(),
            'label': IntField()
        })
        writer.from_indexed_dataset(ds)

    # 2. create transformations pipeline
    import numpy as np

    CIFAR_MEAN = np.array([0.4914, 0.4822, 0.4465]) * 255
    CIFAR_STD = np.array([0.2023, 0.1994, 0.2010]) * 255

    loaders = {}
    for name in ['train', 'test']:
        label_pipeline: List[Operation] = [IntDecoder(), ToTensor(), ToDevice('cuda:0'), Squeeze()]
        image_pipeline: List[Operation] = [SimpleRGBImageDecoder()]
        if name == 'train':
            image_pipeline.extend([
                # transforms.RandomCrop(32, padding=4), # <= THIS GAVE AN ERROR 
                RandomHorizontalFlip(),
                RandomTranslate(padding=2),
            ])

        image_pipeline.extend([
            ToTensor(),
            ToDevice('cuda:0', non_blocking=True),
            ToTorchImage(),
            NormalizeImage(CIFAR_MEAN, CIFAR_STD, np.float16)
        ])
        ordering = OrderOption.RANDOM if name == 'train' else OrderOption.SEQUENTIAL
        loaders[name] = Loader(paths[name], batch_size=batch_size, num_workers=num_workers,
                               order=ordering, drop_last=(name == 'train'),
                               pipelines={'image': image_pipeline, 'label': label_pipeline})

    return loaders['train'], loaders['test']
GuillaumeLeclerc commented 2 years ago

Hello, you should take a look at our ImageNet example. it has an example of random cropping. Feel free to re-open if you have a specific issue!

ThibaultCastells commented 2 years ago

Are you talking about the RandomResizedCropRGBImageDecoder class? If yes, then I already tried it but it didn't improve the accuracy. I only got 38%. Here is the code I used:

        # ...
        if name == 'train':
            image_pipeline: List[Operation] = [
                RandomResizedCropRGBImageDecoder((32,32)),
                RandomHorizontalFlip()
            ]
        else:
            image_pipeline: List[Operation] = [SimpleRGBImageDecoder()]

        image_pipeline.extend([
            ToTensor(),
            ToDevice('cuda:0', non_blocking=True),
            ToTorchImage(),
            # Convert(torch.float32),
            NormalizeImage(CIFAR_MEAN, CIFAR_STD, np.float16)
        ])
        ordering = OrderOption.RANDOM if name == 'train' else OrderOption.SEQUENTIAL
        loaders[name] = Loader(paths[name], batch_size=batch_size, num_workers=num_workers,
                               order=ordering, drop_last=(name == 'train'),
                               pipelines={'image': image_pipeline, 'label': label_pipeline})

    return loaders['train'], loaders['test']

My assumption is that it's due to the "resize", which doesn't exist in torchvision.transforms.RandomCrop so the behavior is different.

To be sure that the difference is caused by the crop operation, I tried to remove everything except RandomHorizontalFlip and normalization. Then I added the crop. Without ffcv, both gave around 89% With ffcv, I got ~82% with crop, ~88% without it. Also, the accuracy between 2 runs is much less stable than without ffcv (>10% difference between 2 runs sometimes, here I reported the best results).

I also noticed another weird behavior, should I open a separate issue for that? During the training, I save the best checkpoints. Then, at the end of the training, I load the model from this checkpoint and I check the accuracy again. Without ffcv, the validation accuracy from the loaded model is exactly the same as the best accuracy during the training (which makes sense since I loaded the model that gave this accuracy during the training). But with ffcv, the accuracy of the loaded model is really bad (and doesn't correspond to any accuracy of the training). The code is exactly the same in both cases, except for the data loading (and the use of GradScaler and autocast when using quantization, but I get this issue even without quantization).

GuillaumeLeclerc commented 2 years ago

You need to specify your aspect and ratio in your RandomResizedCropRGBImageDecoder. Just take a look at the images generated by your pipeline and the problem should be obvious.

CIFAR images already are 32x32, What is the goal that you are trying to achieve with your cropping ?

About the model saving, you should create a new issue. Make sure you have a reproducible example though. I can't see how FFCV could screw the loading/saving of the weights of your model since it doesn't even receive a reference on the model at any time, so it seems impossible for it to change the behavior there.