libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.8k stars 180 forks source link

Can't be in JIT mode and on the GPU #261

Closed samuelstevens closed 1 year ago

samuelstevens commented 1 year ago

I would like to do as much work on the CPU to avoid taking up GPU memory, but I get

AssertionError: Can't be in JIT mode and on the GPU

How can I indicate to the loader that some operations need JIT and some are not?

Dataloader code (I don't think the writer file is necessary).

return ffcv.loader.Loader(
    ffcv_path,
    batch_size=batch_size,
    num_workers=16,
    order=ffcv.loader.OrderOption.RANDOM,
    os_cache=True,
    distributed=True,
    drop_last=False,
    pipelines={
        "image": [
            ffcv.fields.decoders.SimpleRGBImageDecoder(),
            ffcv.transforms.NormalizeImage(
                self.mean.numpy(), self.std.numpy(), np.float32
            ),
            ffcv.transforms.ToTensor(),
            ffcv.transforms.ToTorchImage(),
            ffcv.transforms.ToDevice(accelerator.device),
        ],
        "label": [
            ffcv.fields.decoders.NDArrayDecoder(),
            ffcv.transforms.ToTensor(),
            ffcv.transforms.Convert(torch.int64),
            ffcv.transforms.ToDevice(accelerator.device),
        ],
    },
)
samuelstevens commented 1 year ago

I tried fixing it by switching the transforms in "label", specifically, to:

"label": [
    ffcv.fields.decoders.NDArrayDecoder(),
    ffcv.transforms.Convert(np.int64),
    ffcv.transforms.ToTensor(),
    ffcv.transforms.ToDevice(accelerator.device),
]

But now I get: Untyped global name 'self': Cannot determine Numba type of <class 'ffcv.transforms.ops.Convert'>

samuelstevens commented 1 year ago

Fixed with:

"label": [
    ffcv.fields.decoders.NDArrayDecoder(),    
    ffcv.transforms.ToTensor(),
    ffcv.transforms.Convert(torch.int64),
    ffcv.transforms.ToDevice(accelerator.device),
]

Unclear to me why this makes a difference but there are no more errors!

niniack commented 7 months ago

Hi @samuelstevens , I'm running into the same initial error. Just wanted to ask you, since it is not obvious to me, what is the difference between the first and the last label pipeline? It seems you are using the same decoders and transforms in the same order.

samuelstevens commented 7 months ago

I switched the order of ToTensor and Convert(torch.int64)