libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.86k stars 180 forks source link

torch.nn.Module classes cannot be used in Pipeline #84

Closed chengxuz closed 2 years ago

chengxuz commented 2 years ago

I tried to add color jittering augmentation to the ImageNet training through inserting line torchvision.transforms.ColorJitter(.4,.4,.4) right after RandomHorizontalFlip, but met this error:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'self': Cannot determine Numba type of <class 'ffcv.transforms.module.ModuleWrapper'>

File "../ffcv/ffcv/transforms/module.py", line 25:
        def apply_module(inp, _):
            res = self.module(inp)
            ^

During: resolving callee type: type(CPUDispatcher(<function ModuleWrapper.generate_code.<locals>.apply_module at 0x7f921d4c98b0>))
During: typing of call at  (2)

During: resolving callee type: type(CPUDispatcher(<function ModuleWrapper.generate_code.<locals>.apply_module at 0x7f921d4c98b0>))
During: typing of call at  (2)

File "/home/chengxuz/ffcv-imagenet", line 2:
<source missing, REPL/exec in use?>

Any idea on what's happening here and how to fix this?

GuillaumeLeclerc commented 2 years ago

It should be working. I'll take a look!

PS: Torch augmentations are really slow especially if you run them before copying them on the GPU. We recommend using them only for experimentation purposes. If an augmentation is useful you should consider implementing a numpy version (and optionally share it with the community).

andrewilyas commented 2 years ago

@chengxuz can you post what your full pipeline was? From the output you posted, it looks like you might be trying to apply a torchvision transform on a NumPy array (i.e., before calling ToTensor).

chengxuz commented 2 years ago

I did this to the ImageNet training pipeline (https://github.com/libffcv/ffcv-imagenet/blob/main/train_imagenet.py#L223):

        image_pipeline: List[Operation] = [
            self.decoder,
            RandomHorizontalFlip(),
            torchvision.transforms.ColorJitter(.4,.4,.4),
            ToTensor(),
            ToDevice(ch.device(this_device), non_blocking=True),
            ToTorchImage(),
            NormalizeImage(IMAGENET_MEAN, IMAGENET_STD, np.float16)
        ]

I was following this doc: https://github.com/libffcv/ffcv/blob/main/docs/making_dataloaders.rst#transforms, which I guess is out-dated now?

chengxuz commented 2 years ago

After moving the torchvision.transforms.ColorJitter(.4,.4,.4) after the ToTorchImage function, I think it works! Is this the best location for this transform to be?

Anyway, thanks for your help! It would be great if you can also update the document.

andrewilyas commented 2 years ago

Yeah, all of the torchvision transforms operate on PyTorch tensors, so they have to be put after ToTorchImage---we will add this to the documentation and make the error message a bit more descriptive :) Thanks!

vturrisi commented 2 years ago

To complement this, I'm also experimenting with a mixed ffcv/torchvision pipeline before writing the augmentations as ffcv and I noticed a weird behaviour. My pipeline is defined like this:

image_pipeline = [
        RandomResizedCropRGBImageDecoder((crop_size, crop_size), scale=(min_scale, max_scale)),
        RandomHorizontalFlip(flip_prob=horizontal_flip_prob),
        ToTensor(),
        ToDevice(device, non_blocking=True),
        ToTorchImage(),
        transforms.RandomApply(
            [transforms.ColorJitter(brightness, contrast, saturation, hue)],
            p=color_jitter_prob,
        ),
        NormalizeImage(mean=mean, std=std, type=np.float16),
    ]

GPU memory oscillating between 10gb and 5/6gb. If I comment out the RandomApply operation, it is stable at around 6gb. Are there any new allocations that are made?

andrewilyas commented 2 years ago

Memory is only pre-allocated for FFCV transforms, so the torchvision transforms there are probably allocating memory at each iteration. Rewriting the torchvision transform as an FFCV one will fix this!

chengxuz commented 2 years ago

Just another followup on this, if the augmentation (like ColorJitter) is applied in this way, all images in the same batch will share the same augmentation dynamics, like the randomly determined color brightness, cotrast, saturation, and hue factors. This would be terrible for typical contrastive learning algorithms, so it seems that rewriting the transform as an FFCV one is indeed be needed.

GuillaumeLeclerc commented 2 years ago

Can you clarify what you mean @chengxuz with an example. This might not be expected behavior

chengxuz commented 2 years ago

For example, if I add this augmentation Random Grayscale. I would expect that the images within the same batch would randomly be turned into grayscale with the probability I specified, meaning one image might be grayscale, the other image might not. This is achieved in typical usage of this module as it will be applied to each PIL image independently, therefore yielding different behaviors for images even in the same batch. But the mechanism of this module when processing a batch of images is that the whole batch will be grayscale or not. This is because the module processes the whole batch as one unit to decide doing the grayscale or not. As FFCV applies the pipeline to batch of images, this then leads to the current behavior.

vturrisi commented 2 years ago

Haven't dived into FFCV's augmentations, but what nvidia dali does (and it also applies the pipeline to the whole batch) is using a multiplexing operation to decide which images to apply augmentations (https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/expressions/expr_conditional_and_masking.html)

GuillaumeLeclerc commented 2 years ago

@chengxuz Do you have a reproduction script because reading the documentation from torchvision it seems that it's not what would happen. FFCV passes the batch as is to the augmentation, if it flips the coin on a per image basis or for the whole batch is unfortunately beyond our control (but from what I understand it's not what they are doing)

GuillaumeLeclerc commented 2 years ago

I personally checked and it seems that you are right @chengxuz. This sounds terribly unintuitive and imo should be reported to torchvision. FFCV handles images per image so this should never be a problem though.

andrewilyas commented 2 years ago

@vturrisi FFCV also has per-image randomness in its augmentations (so I think the only augmentations that don't support this are the torchvision ones).

Since it looks like all the FFCV-related problems here are solved, I'll close this issue for now---feel free to re-open if there's anything we missed!

realliyifei commented 1 year ago

@andrewilyas thanks for the comment above, but how can we rewrite below from transformers to FFCV since the latter doesn't have the counterpart of them (namely, RandomApply, ColorJitter, RandomGrayscale, GaussianBlur)?

And if mix using FFCV with transforms like below, it would slower the FFCV a lot by the previous discussions?

 image_pipeline_q = [img_decoder,
                      RandomResizedCrop(),
                      RandomHorizontalFlip(),
                      ToTensor(),
                      ToDevice(device, non_blocking=True),
                      ToTorchImage(),
                      *custom_img_transforms,
                      transforms.RandomApply(
                          [transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)  # not strengthened
                      ], p=0.8),
                      transforms.RandomGrayscale(p=0.2),
                      transforms.RandomApply([GaussianBlur([.1, 2.])], p=0.5),
                      NormalizeImage(dataset_mean * 255, dataset_std * 255, np.float16)]
# image_pipeline_k = ... the same as image_pipeline_q
pipelines = {'image_q': image_pipeline_q, 'image_k': image_pipeline_k}

where

class GaussianBlur(object):
    """Gaussian blur augmentation in SimCLR https://arxiv.org/abs/2002.05709"""

    def __init__(self, sigma=[.1, 2.]):
        self.sigma = sigma

    def __call__(self, x):
        sigma = random.uniform(self.sigma[0], self.sigma[1])
        x = x.filter(PIL.ImageFilter.GaussianBlur(radius=sigma))
        return x