Open AntonioMacaronio opened 4 months ago
Hi! The first feature (get_batch(indices)
) is unfortunately not supported, but you can create a loader from a specific set of indices, so if the overhead isn't too big you can create a loader with one batch from those indices and iterate through that loader.
For the second question, this is very easy! You can indeed just create a PyTorch dataloader that serves indices and write that to a beton. If that's too expensive, you can design a custom transform to do this, for example see here: https://github.com/libffcv/ffcv/blob/3a12966b3afe3a81733a732e633317d747bfaac7/examples/docs_examples/transform_with_inds.py or the docs here: https://docs.ffcv.io/ffcv_examples/transform_with_inds.html
Hi everyone, I am wondering if it is possible for a user to create a custom batch of images with ffcv's speed.
get_batch(indices)
that creates a batch from the input indices that is a method offfcv.loader.Loader
. The reason why I would like this is because I need an infinite random sampler - aka, if I have a dataset of 5000 images, I need to create batches of 100 images by random selection and an image can be drawn multiple timesAdditionally, is there a way a batch of images can also contain other relevant information? I am wondering because it would be ideal if the batch could be a python dictionary with keys such as ['image', 'index'] where batch['image'] returns a list of tensors or something similar (as my images are not the same size) and batch['index'] returns the dataset index of each image.
ffcv.writer.DatasetWriter
?