libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.86k stars 180 forks source link

Is there a get_batch(indices) method + custom collate function? #381

Open AntonioMacaronio opened 4 months ago

AntonioMacaronio commented 4 months ago

Hi everyone, I am wondering if it is possible for a user to create a custom batch of images with ffcv's speed.

Additionally, is there a way a batch of images can also contain other relevant information? I am wondering because it would be ideal if the batch could be a python dictionary with keys such as ['image', 'index'] where batch['image'] returns a list of tensors or something similar (as my images are not the same size) and batch['index'] returns the dataset index of each image.

andrewilyas commented 3 months ago

Hi! The first feature (get_batch(indices)) is unfortunately not supported, but you can create a loader from a specific set of indices, so if the overhead isn't too big you can create a loader with one batch from those indices and iterate through that loader.

For the second question, this is very easy! You can indeed just create a PyTorch dataloader that serves indices and write that to a beton. If that's too expensive, you can design a custom transform to do this, for example see here: https://github.com/libffcv/ffcv/blob/3a12966b3afe3a81733a732e633317d747bfaac7/examples/docs_examples/transform_with_inds.py or the docs here: https://docs.ffcv.io/ffcv_examples/transform_with_inds.html