How to process the whole batch

Most datasets currently available in Torchmeta rely on a hierarchy of three objects:

Dataset, which is simply a PyTorch dataset, which is responsible for getting the individual examples for a given label. For example it can be a dataset containing all the (20) examples of the letter A in Omniglot.
ClassDataset, which is producing the datasets for different classes. Each index of this class corresponds to a single label. For example in Omniglot, this contains 1028 elements, and class_dataset[0] returns an instance of Dataset (above) containing all the examples of images_background/Alphabet_of_the_Magi/character01.
CombinationMetaDataset combines multiple indices (for example (0, 1, 2, 3, 4)) to create a task over the corresponding labels, the individual indices corresponding to the ones in ClassDataset above.

Something you could do in your case is to tokenize all the elements of Dataset at once, because this is essentially a batch of data (from which the sampler is going to sample from to create the actual datasets for the task).

Another option could be to look into how to allow __getitem__(index) to get a batch (list) of indices for index. This is already possible in standard PyTorch datasets, and since Torchmeta datasets are essentially instances of PyTorch datasets this could be possible. I have tried to include that at some point in Torchmeta to improve sampling, but there was no particular improvement for image datasets, especially since processing by Torchvision transforms (e.g. image loading, Resize, etc...) only accept single images, so I ended up not continuing further.

tristandeleu / pytorch-meta

How to process the whole batch #102