For how long does the meta batch data loader iterate?

tristandeleu / pytorch-meta

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

https://tristandeleu.github.io/pytorch-meta/

MIT License

2k stars 256 forks source link

For how long does the meta batch data loader iterate? #69

Open renesax14 opened 4 years ago

renesax14 commented 4 years ago

Usually in python iterators stop when the StopIteration exception is raised.

But I saw that the length of the data loader is a strange number (where I expected infinity since it's usually up to the user how many episodes they want to do, which usually corresponds to a batch of tasks sampled).

So when does the data loader stop?

Code I am referencing is from the example: https://github.com/tristandeleu/pytorch-meta/blob/master/examples/maml-higher/train.py

    # Training loop
    with tqdm(dataloader, total=args.num_batches) as pbar:
        for batch_idx, batch in enumerate(pbar):
            model.zero_grad()

is args.num_batches the same as the number of episodes?

the weird size I mentioned:

        print(len(dataloader)) # prints 446400

tristandeleu commented 4 years ago

Like any vanilla PyTorch dataloader, the dataloader has size len(dataset) // batch_size, where len(dataset) is the total number of tasks (C(4112, 5) for 5-way Omniglot). That 446400 is indeed surprising, because when I tried print(len(dataloader)) in the maml-higher example, I get 610069224856650 which looks reasonable.

However since the dataloader is a combinatorially large, it is not recommended to loop over the whole dataloader (and reaching the StopIteration exception as you mention). That's why you have the args.num_batches argument in the example, which loops over args.num_batches batches only (it breaks here).

I am closing this issue, because the example is working as intended. Feel free to re-open it if you still get len(dataloader) == 446400.

renesax14 commented 4 years ago

Like any vanilla PyTorch dataloader, the dataloader has size len(dataset) // batch_size, where len(dataset) is the total number of tasks (C(4112, 5) for 5-way Omniglot). That 446400 is indeed surprising, because when I tried print(len(dataloader)) in the maml-higher example, I get 610069224856650 which looks reasonable.

However since the dataloader is a combinatorially large, it is not recommended to loop over the whole dataloader (and reaching the StopIteration exception as you mention). That's why you have the args.num_batches argument in the example, which loops over args.num_batches batches only (it breaks here).

I am closing this issue, because the example is working as intended. Feel free to re-open it if you still get len(dataloader) == 446400.

HI Tristan, thanks for your reply. It was very helpful.

But I am still confused. When I do:

print(f'len(dataloader)= {len(dataloader)}')

With meta-batch=1, I get:

Where is that number coming from?

What confuses me is that it is NOT infinity. In standard training (when meta-learning is not involved) we usually have the data loader go through the data-set entirely (one epoch). However, in the N-way, K-shot classification to form 1 (meta) batch we sample N classes and K images for each (plus K_eval for the query set) task we sample. This means that in principle the number of tasks we can generate are essentially infinite since the task we can do are an infinite combination (at least in principle).

So what I am confused os where 7142400 is coming from. Is it from creating all possible episodes/batches of size 1/tasks when we have a fixed set of classes and images to form the tasks (in mini-imagenet we have 64 + 600 images).

Can you clarify that?

tristandeleu commented 4 years ago

But I am still confused. When I do:
print(f'len(dataloader)= {len(dataloader)}')
With meta-batch=1, I get:
7142400
Where is that number coming from?

I don't know where this number comes from. When I tried it on my end I'm getting 9761107597706400, with batch_size=1 on Omniglot 5-way 5-shots. Here is the diff of the changes I am making and what I get

diff --git a/examples/maml-higher/train.py b/examples/maml-higher/train.py
index 71634d8..1d5abd3 100644
--- a/examples/maml-higher/train.py
+++ b/examples/maml-higher/train.py
@@ -82,7 +82,7 @@ def train(args):
                        meta_train=True,
                        download=args.download)
     dataloader = BatchMetaDataLoader(dataset,
-                                     batch_size=args.batch_size,
+                                     batch_size=1,
                                      shuffle=True,
                                      num_workers=args.num_workers)

@@ -94,6 +94,8 @@ def train(args):
     inner_optimiser = torch.optim.SGD(model.parameters(), lr=args.step_size)
     meta_optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

+    print(f'len(dataloader) = {len(dataloader)}')
+    import pdb; pdb.set_trace()
     # Training loop
     with tqdm(dataloader, total=args.num_batches) as pbar:
         for batch_idx, batch in enumerate(pbar):

$ python examples/maml-higher/train.py data

This script is an example to showcase the data-loading features of Torchmeta in conjunction with using higher to make models "unrollable" and optimizers differentiable, and as such has been  very lightly tested.
len(dataloader) = 9761107597706400
> pytorch-meta/examples/maml-higher/train.py(100)train()
-> with tqdm(dataloader, total=args.num_batches) as pbar:
(Pdb)

Can you provide a minimal example where len(dataloader) == 7142400?

What confuses me is that it is NOT infinity. In standard training (when meta-learning is not involved) we usually have the data loader go through the data-set entirely (one epoch). However, in the N-way, K-shot classification to form 1 (meta) batch we sample N classes and K images for each (plus K_eval for the query set) task we sample. This means that in principle the number of tasks we can generate are essentially infinite since the task we can do are an infinite combination (at least in principle).

The number of tasks is not infinite, but it is combinatorially large. For example, in Omniglot 5-way classification, the number of possible tasks is C(4412, 5) = 9761107597706400; this is a very large number, but it is not infinity. See also my previous comment https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-654742810 on how the tasks are created, and why this number. The length of the dataset returns the number of possible tasks, when this number fits in machine precision. From a more practical point of view, the __len__ (that implements len(dataset)) needs to be an integer, so it cannot be infinite. If you want to have more details about the way the tasks are sampled, check this comment https://github.com/tristandeleu/pytorch-meta/issues/67#issuecomment-654774351.

renesax14 commented 4 years ago

Hi tristan, thnx for your help!

That number comes from miniimagnet from the code you provide.

I will provide code as soon as Im on my computer.

Sent from my iPhone

On Jul 8, 2020, at 6:15 AM, Tristan Deleu notifications@github.com wrote:

Reopened #69.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

brando90 commented 4 years ago

But I am still confused. When I do:
print(f'len(dataloader)= {len(dataloader)}')
With meta-batch=1, I get:
7142400
Where is that number coming from?
I don't know where this number comes from. When I tried it on my end I'm getting 9761107597706400, with batch_size=1 on Omniglot 5-way 5-shots. Here is the diff of the changes I am making and what I get
diff --git a/examples/maml-higher/train.py b/examples/maml-higher/train.py
index 71634d8..1d5abd3 100644
--- a/examples/maml-higher/train.py
+++ b/examples/maml-higher/train.py
@@ -82,7 +82,7 @@ def train(args):
                        meta_train=True,
                        download=args.download)
     dataloader = BatchMetaDataLoader(dataset,
-                                     batch_size=args.batch_size,
+                                     batch_size=1,
                                      shuffle=True,
                                      num_workers=args.num_workers)

@@ -94,6 +94,8 @@ def train(args):
     inner_optimiser = torch.optim.SGD(model.parameters(), lr=args.step_size)
     meta_optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

+    print(f'len(dataloader) = {len(dataloader)}')
+    import pdb; pdb.set_trace()
     # Training loop
     with tqdm(dataloader, total=args.num_batches) as pbar:
         for batch_idx, batch in enumerate(pbar):
$ python examples/maml-higher/train.py data

This script is an example to showcase the data-loading features of Torchmeta in conjunction with using higher to make models "unrollable" and optimizers differentiable, and as such has been  very lightly tested.
len(dataloader) = 9761107597706400
> pytorch-meta/examples/maml-higher/train.py(100)train()
-> with tqdm(dataloader, total=args.num_batches) as pbar:
(Pdb)
Can you provide a minimal example where len(dataloader) == 7142400?

What confuses me is that it is NOT infinity. In standard training (when meta-learning is not involved) we usually have the data loader go through the data-set entirely (one epoch). However, in the N-way, K-shot classification to form 1 (meta) batch we sample N classes and K images for each (plus K_eval for the query set) task we sample. This means that in principle the number of tasks we can generate are essentially infinite since the task we can do are an infinite combination (at least in principle).

The number of tasks is not infinite, but it is combinatorially large. For example, in Omniglot 5-way classification, the number of possible tasks is C(4412, 5) = 9761107597706400; this is a very large number, but it is not infinity. See also my previous comment #69 (comment) on how the tasks are created, and why this number. The length of the dataset returns the number of possible tasks, when this number fits in machine precision. From a more practical point of view, the __len__ (that implements len(dataset)) needs to be an integer, so it cannot be infinite. If you want to have more details about the way the tasks are sampled, check this comment #67 (comment).

this was a really useful discussion thanks Trist.

I am curious, what is the formula for the size of the dataloader (i.e. number of tasks)?

I was thinking of something using the choose function CN_Ci choose N(K+K_eval) where C is the total # of labels (e.g. 64 for mini-imagenet and N_Ci number of examples per label e.g. 600 images for each label). But that formula seems to give me a number that is much larger than I wanted.

Do you have an actual formula for calculating this length?

tristandeleu commented 4 years ago

There was indeed a bug in the way the length of the dataset was computed, thank you! I have fixed it in Torchmeta 1.5.2. For all the datasets like Omniglot or MiniImagenet (i.e. inheriting from CombinationMetaDataset), the general formula for the number of tasks is

C(num_classes_in_split * (1 + num_class_augmentations), num_classes_per_task)

Where

num_classes_in_split is the number of possible classes to pick from in the meta-split, in order to create your tasks. For example, Omniglot has 1028 classes in the meta-train split.
num_class_augmentations is the number of class augmentations you add to the dataset. For example, the standard setting for Omniglot is to augment the pool of classes in the split (e.g. the 1028 in the meta-train split) with "new classes", corresponding to rotations by [90, 180, 270], meaning 3 class augmentations
num_classes_per_task is the number of classes in each task, which is the N in "N-way" classification.

That is for 5-way classification tasks on Omniglot from the meta-train split, there are in total

C(1028 * 4, 5) = 9772996309770512

And you can verify it with len(dataset)

from torchmeta.datasets.helpers import omniglot
from scipy.special import comb

dataset = omniglot('data', ways=5, shots=5, meta_train=True)

print(len(dataset))  # 9772996309770512
print(comb(1028 * 4, 5, exact=True))  # 9772996309770512

brando90 commented 4 years ago

@tristandeleu thanks for checking that out! Will read through the details to help you double check it later.

For now though, I was wondering if the meta-loaders worked as expected for regression (my believe is that they do not but I might have yet another misunderstanding of how looping through tasks happens/dataloaders happens).

I noticed that if I make 500 tasks (functions) with 700 examples, then the torchmeta dataloader makes 500 loops/iterations. I would have expected many more iterations e.g. there are 5+15=20 potential examples in 1 task and with 700 examples there seems to be many missing. Perhaps something like ~ C(500*700, 20) is what I more or less expected (or at least that's a very rough estimate). But whatever it is it's definitively more than 500.

Can you take a look at that for me, please?

Thanks for sharing you great library btw!

tristandeleu commented 4 years ago

The way it works in Torchmeta (for both regression and classification tasks), tasks and datasets have a one-to-one correspondance, to ensure reproducibility. This means that a specific task will be associated to one and only one dataset: if you sample the same task twice, you'll get the same train/test datasets.

Concretely for regression tasks, this means that even though you ask for 700 examples for a specific task/function, only the first* 5+15 (if shots=5 and test_shots=15) will ever be used, and the remaining 680 will never be seen. So if you ask for 500 tasks (functions), you'll get a dataloader with 500 elements. That one-to-one correspondance is also the reason why you get the formula in https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-705702297 for the number of tasks in a classification problem (and not more than that, e.g. if you could sample different images for the same tuple of labels).

If you want to have different samples for the same function, one way to do it would be to sample the same function multiple times when the tasks are created (e.g. here for Sinusoid), but with different data sampled (e.g. here for Sinusoid).

*Not necessarily the first 20 as in dataset[:20], because the dataset might be shuffled if shuffle=True in ClassSplitter, but it will always be the same 20 samples; the random permutation applied when shuffle=True is task dependent (function of hash(task)).

brando90 commented 4 years ago

The way it works in Torchmeta (for both regression and classification tasks), tasks and datasets have a one-to-one correspondance, to ensure reproducibility. This means that a specific task will be associated to one and only one dataset: if you sample the same task twice, you'll get the same train/test datasets.

Concretely for regression tasks, this means that even though you ask for 700 examples for a specific task/function, only the first* 5+15 (if shots=5 and test_shots=15) will ever be used, and the remaining 680 will never be seen. So if you ask for 500 tasks (functions), you'll get a dataloader with 500 elements. That one-to-one correspondance is also the reason why you get the formula in #69 (comment) for the number of tasks in a classification problem (and not more than that, e.g. if you could sample different images for the same tuple of labels).

If you want to have different samples for the same function, one way to do it would be to sample the same function multiple times when the tasks are created (e.g. here for Sinusoid), but with different data sampled (e.g. here for Sinusoid).

*Not necessarily the first 20 as in dataset[:20], because the dataset might be shuffled if shuffle=True in ClassSplitter, but it will always be the same 20 samples; the random permutation applied when shuffle=True is task dependent (function of hash(task)).

using this scheme, did you try reproducing MAML and get it's ~63% accuracy on mini-imagenet. I am just worried that this isn't the way most standard meta-learning algorithms mean by "episodic training". The use of the same 20 images during training seems to me a strong underutilization of the meta-train set during training.

brando90 commented 4 years ago

There was indeed a bug in the way the length of the dataset was computed, thank you! I have fixed it in Torchmeta 1.5.2. For all the datasets like Omniglot or MiniImagenet (i.e. inheriting from CombinationMetaDataset), the general formula for the number of tasks is
C(num_classes_in_split * (1 + num_class_augmentations), num_classes_per_task)
Where

num_classes_in_split is the number of possible classes to pick from in the meta-split, in order to create your tasks. For example, Omniglot has 1028 classes in the meta-train split.

num_class_augmentations is the number of class augmentations you add to the dataset. For example, the standard setting for Omniglot is to augment the pool of classes in the split (e.g. the 1028 in the meta-train split) with "new classes", corresponding to rotations by [90, 180, 270], meaning 3 class augmentations

num_classes_per_task is the number of classes in each task, which is the N in "N-way" classification.

That is for 5-way classification tasks on Omniglot from the meta-train split, there are in total
C(1028 * 4, 5) = 9772996309770512
And you can verify it with len(dataset)
from torchmeta.datasets.helpers import omniglot
from scipy.special import comb

dataset = omniglot('data', ways=5, shots=5, meta_train=True)

print(len(dataset))  # 9772996309770512
print(comb(1028 * 4, 5, exact=True))  # 9772996309770512

why is my augmented dataloader the same length as the normal one?

dataset = miniimagenet(data_path,
  ...:                        ways=5, shots=5, test_shots=15, meta_split=meta_split, download=True)
  ...: dataloader = BatchMetaDataLoader(dataset, batch_size=16, num_workers=4)
  ...: 
  ...: # model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=False)
  ...: 
  ...: print(len(dataloader))
  ...: 
476532
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
  ...: data_augmentation_transforms = transforms.Compose([
  ...:     transforms.RandomResizedCrop(84),
  ...:     transforms.RandomHorizontalFlip(),
  ...:     transforms.ColorJitter(
  ...:         brightness=0.4,
  ...:         contrast=0.4,
  ...:         saturation=0.4,
  ...:         hue=0.2),
  ...:     transforms.ToTensor(),
  ...:     normalize])
  ...: dataset = miniimagenet(data_path,
  ...:                        transform=data_augmentation_transforms,
  ...:                        ways=5, shots=5, test_shots=15, meta_split=meta_split, download=True)
  ...: dataloader = BatchMetaDataLoader(dataset, batch_size=16, num_workers=4)
  ...: print(f'len augmented = {len(dataloader)}')
  ...: 
  ...: dataset = miniimagenet(data_path, ways=5, shots=5, test_shots=15, meta_split=meta_split, download=True)
  ...: dataloader = BatchMetaDataLoader(dataset, batch_size=16, num_workers=4)
  ...: print(f'len normal = {len(dataloader)}')
  ...: 
len augmented = 476532
len normal = 476532

you can reproduce with this:

import torch

import torchvision.transforms as transforms

# import torchmeta
# from torchmeta.datasets.helpers import omniglot
from torchmeta.datasets.helpers import miniimagenet
from torchmeta.utils.data import BatchMetaDataLoader

from pathlib import Path

meta_split = 'train'
data_path = Path('~/data/').expanduser()

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
data_augmentation_transforms = transforms.Compose([
    transforms.RandomResizedCrop(84),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(
        brightness=0.4,
        contrast=0.4,
        saturation=0.4,
        hue=0.2),
    transforms.ToTensor(),
    normalize])
dataset = miniimagenet(data_path,
                       transform=data_augmentation_transforms,
                       ways=5, shots=5, test_shots=15, meta_split=meta_split, download=True)
dataloader = BatchMetaDataLoader(dataset, batch_size=16, num_workers=4)
print(f'len augmented = {len(dataloader)}')

dataset = miniimagenet(data_path, ways=5, shots=5, test_shots=15, meta_split=meta_split, download=True)
dataloader = BatchMetaDataLoader(dataset, batch_size=16, num_workers=4)
print(f'len normal = {len(dataloader)}')

print('success\a')

I made sure to have torchmeta 1.5.3 when I did pip install torchmeta --upgrade

tristandeleu commented 4 years ago

using this scheme, did you try reproducing MAML and get it's ~63% accuracy on mini-imagenet. I am just worried that this isn't the way most standard meta-learning algorithms mean by "episodic training". The use of the same 20 images during training seems to me a strong underutilization of the meta-train set during training.

Yes I have been able to reproduce the 63% accuracy with MAML on miniImageNet (5-shot, 5-way) using this data-loading scheme. Keep in mind that the number of possible tasks is combinatorially large, so the probability of sampling the same task twice is very small (e.g. for the meta-train split of miniImageNet, and 5-way classification tasks, you have over 7.6M tasks), meaning that this is very similar to the behavior you'd see if you were to sample the tasks on the fly (which is probably what you mean by "standard") because this sampling scheme also has very little chance to sample the same task twice.

Also note that it doesn't mean that the same 20 images are always used for all tasks involving a specific class. See the second part of https://github.com/tristandeleu/pytorch-meta/issues/67#issuecomment-654774351 for an example. In that sense, when shuffle=True, all the images from the dataset get a chance to be sampled for some task. The one-to-one correspondance only means that if you sample the same task (i.e. the same tuple of classes for the classification task), you'll get the same images.

Although I wouldn't recommend it, you can bypass this behavior if you want by setting the hash of the task to a random number (see https://github.com/tristandeleu/pytorch-meta/issues/95#issuecomment-700540866)

why is my augmented dataloader the same length as the normal one?

You don't have any class_augmentations here, you only have a transform. The two are different:

transform applies the same transformation to all images (this is identical to transform in Torchvision). For example you might want to normalize/resize/crop all images.
class_augmentations creates new "pseudo-classes" that are added to the pool of possible classes to sample from in order to create a task. For example this would be rotations in Omniglot.

Here is an example of what class_augmentations does on Omniglot (there is a bug in Torchmeta 1.5.3 where it was using the same image for different class augmentation, which has been fixed in master)

from torchmeta.datasets.helpers import omniglot
from PIL import Image

dataset = omniglot('data', shots=1, ways=2, meta_train=True,
                   transform=None, target_transform=None)

# There are 1028 classes in Omniglot's meta-train split. We offset
# by this much to get the same label with a different rotation.
task = dataset[(0, 1028 + 0)]
train = task['train']

print(f'Number of examples in training set: {len(train)}')

img_0, target_0 = train[0]
img_1, target_1 = train[1]

print(f'Target of sample 0: {target_0}')
print(f'Target of sample 1: {target_1}')

# For display
img = Image.new(img_0.mode, (2 * img_0.width, img_0.height))
img.paste(img_0, (0, 0))
img.paste(img_1, (img_0.width, 0))

img.show()

brando90 commented 4 years ago

@tristandeleu in my regression tasks I don't have access to many tasks/functions and they are not being created on the fly like yours where they are basically unbounded. It's a small size of tasks (well 500 seems pretty large to me already) and it's easy for me to go through all the tasks with the meta data loader (it doesn't combinatorially explode since for regression there is no choose function going on). I can't create them from scratch as you are suggesting. Thus, when the next epoch starts, does my meta-learner see different x values (I do have many other x values which I'd like it to see if possible)?

In summary:

What I need is that when the same task (function) is sampled in a different epoch that the examples are also different.

How is that possible for torchmeta? maybe a simple flag or does it just do it automagically?

tristandeleu commented 4 years ago

There is no flag to enable that unfortunately. But in your case since this your own dataset, you can write the Task class so that you get this behavior. Taking Sinusoid as an example, this would require changing the __hash__ function of SinusoidTask to have something like

class SinusoidTask(Task):
    # Other functions __init__, __getitem__ and __len__

    def __hash__(self):
        return random.randrange(1 << 32)

Let me know if this works!

brando90 commented 4 years ago

@tristandeleu will let you know!

I also wanted to do this, have N-way K-shot of size 64-way, 5-shot. But if the dataloader is always giving me the same set of examples (since we only have 1 task) that is no good.

How do I get different examples in this case?

tristandeleu commented 4 years ago

You can probably do the same thing as in https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-708989991. But really if you have a 64-way problem you are likely to get a very large number of possible tasks, so this wouldn't be necessary (for the reasons explained in https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-707610673). Of course this depends on the number of classes you have in your meta-split, but for example in the meta-train split of Omniglot, if you plan on having 64-way tasks this means C(4112, 64) possible tasks, which is over 10^141, so you have no chance of sampling the same task twice, and the fact of having a one-to-one correspondance between tasks and datasets is not limiting.

brando90 commented 4 years ago

You can probably do the same thing as in #69 (comment). But really if you have a 64-way problem you are likely to get a very large number of possible tasks, so this wouldn't be necessary (for the reasons explained in #69 (comment)). Of course this depends on the number of classes you have in your meta-split, but for example in the meta-train split of Omniglot, if you plan on having 64-way tasks this means C(4112, 64) possible tasks, which is over 10^141, so you have no chance of sampling the same task twice, and the fact of having a one-to-one correspondance between tasks and datasets is not limiting.

What I want is to use all the labels at once. e.g. as the authors do here: https://arxiv.org/abs/1801.05401.

I want my dataloader to do this:

1) get me all the labels (64 for mini-imagenet or 1028) for mini-imagenet C(64,64) = 1. 2) give me k shots different examples for each of the 64 labels (same "task")

I also want to always have an output of 64 but get 5 classes (this different episodic manner that Y.Wang does), for this one I think innate pytorch might solve it for me. The first one is probably changing the hash function? (to guarantee different examples) Since I always do "1 epoch" according to the definition you have (btw your definitions are well motivated, I'm just trying some different things).

tristandeleu commented 4 years ago

Oh sorry I was confused about the setting. In case you want to use all the labels at once, this would probably require a fair amount of work. I think one big issue you might run into is that the meta-dataset would have a single task (even though you are doing this random __hash__ trick, the length of the dataset would still be 1 because you really have only a single task, though multiple datasets), so you wouldn't be able to benefit from getting batches of tasks (the same way if you'd had a PyTorch dataset with a single example).

I don't think this would be easy with the current tools in Torchmeta. One way I can think about doing that is to have explicitly an index per possible dataset, making the length of the meta-dataset something like

C(number of shots, number of examples for a specific label) ** 64

With an appropriate indexing scheme this could be possible: indexes could be tuples of length 64, each element being of length k (for k-shot). You can take inspiration from the way indexes work for CombinationMetaDataset (which are already tuples of label indices). Your indices for the meta-dataset would then look something like:

((1, 2, 3, 4, 5), (2, 4, 6, 8, 10), ...)  # of length 64

meaning that you'd select images with indices (1, 2, 3, 4, 5) for label 1, (2, 4, 6, 8, 10) for label 2, etc...

You would also need to create custom components for a number of things for compatibility though, most importantly the Sampler and the ClassSplitter).

brando90 commented 4 years ago

Oh sorry I was confused about the setting. In case you want to use all the labels at once, this would probably require a fair amount of work. I think one big issue you might run into is that the meta-dataset would have a single task (even though you are doing this random __hash__ trick, the length of the dataset would still be 1 because you really have only a single task, though multiple datasets), so you wouldn't be able to benefit from getting batches of tasks (the same way if you'd had a PyTorch dataset with a single example).

I don't think this would be easy with the current tools in Torchmeta. One way I can think about doing that is to have explicitly an index per possible dataset, making the length of the meta-dataset something like
C(number of shots, number of examples for a specific label) ** 64
With an appropriate indexing scheme this could be possible: indexes could be tuples of length 64, each element being of length k (for k-shot). You can take inspiration from the way indexes work for CombinationMetaDataset (which are already tuples of label indices). Your indices for the meta-dataset would then look something like:
((1, 2, 3, 4, 5), (2, 4, 6, 8, 10), ...)  # of length 64
meaning that you'd select images with indices (1, 2, 3, 4, 5) for label 1, (2, 4, 6, 8, 10) for label 2, etc...

You would also need to create custom components for a number of things for compatibility though, most importantly the Sampler and the ClassSplitter).

do you think it's just simpler for me to implement my own dataloader? I was hoping not to do that.

e.g. if I have a meta-batch size of B and N way and total labels C and k-shot, then I'd have a final batch of data of the following:

B tasks with N classes and K (different) examples each so total tensor size is:

[BNK, CHW]

I was hoping to use this using your library so that I don't have to re-implement this for each data set I use (since you already have a really nice set of datasets available).

If you have an idea how to implement this so all your data sets work let me know! Even just a rough outline would be great! :)

BTw, thank for all the discussion you've already had with me and great library! It's impressive work.

brando90 commented 4 years ago

To make the problem simpler first, I think the above should be easy when C=N i.e. number of total classes is equal to the N-way. So for that we sample a single task and make sure the hash trick is turned on (so that distinct examples are always given each epoch, since in this case we'd go through the whole data loader at once according to the data loaders definition). Then that at least gives us a 64-way 64-label task. Only a single one of course. Not sure if it's possible to get more at once or extract the 5 we'd need from that...hmm... will think about it.

tristandeleu commented 4 years ago

If C=N, that's precisely the case which is not obvious unfortunately, because of this issue you mention at the end on sampling batches of the same task (what I explained in https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-710785040), with a possible solution.

If C > N, you can probably get away with the random hash trick. For future reference, I am not recommending it in general though.

brando90 commented 4 years ago

There is no flag to enable that unfortunately. But in your case since this your own dataset, you can write the Task class so that you get this behavior. Taking Sinusoid as an example, this would require changing the __hash__ function of SinusoidTask to have something like
class SinusoidTask(Task):
    # Other functions __init__, __getitem__ and __len__

    def __hash__(self):
        return random.randrange(1 << 32)
Let me know if this works!

How would I test if this works?

brando90 commented 4 years ago

@tristandeleu I don't think a the __hash__ is needed. I recreated a data loader (which is my attempt to reset the data loader) and no matter what the first sample from the dataloader gave me different data.

Do you have thoughts?

## sinusioid function
print('Starting Sinusioid cell')

import torchmeta
# from torchmeta.toy import Sinusoid
from torchmeta.utils.data import BatchMetaDataLoader
# from torchmeta.transforms import ClassSplitter

# from tqdm import tqdm

batch_size = 16
shots = 5
test_shots = 15
dataset = torchmeta.toy.helpers.sinusoid(shots=shots, test_shots=test_shots)
dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size, num_workers=4)

# print(f'batch_size = {batch_size}')
# print(f'len(dataloader) = {len(dataloader)}\n')
# for batch_idx, batch in enumerate(dataloader):
#     print(f'batch_idx = {batch_idx}')
#     train_inputs, train_targets = batch['train']
#     test_inputs, test_targets = batch['test']
#     print(f'train_inputs.shape = {train_inputs.shape}')
#     print(f'train_targets.shape = {train_targets.shape}')
#     print(f'test_inputs.shape = {test_inputs.shape}')
#     print(f'test_targets.shape = {test_targets.shape}')
#     if batch_idx >= 1:  # halt after 2 iterations
#         break

# two tasks are different
dl = enumerate(dataloader)

_,x1 = next(dl)
x1,_ = x1['train']
print(f'x1 = {x1.sum()}')
_,x2 = next(dl)
x2,_ = x2['train']
print(f'x2 = {x2.sum()}')

assert(x1.sum() != x2.sum())
print('assert pass, tasks have different data')

# same task twice
dl = enumerate(dataloader)

_,x1 = next(dl)
x1,_ = x1['train']
print(f'x1 = {x1.sum()}')
dl = enumerate(dataloader)
_,x2 = next(dl)
x2,_ = x2['train']
print(f'x2 = {x2.sum()}')

assert(x1.sum() == x2.sum())

print('DONE\a')

output:

Starting Sinusioid cell
x1 = 2.651324701332971
x2 = -19.512318753130284
assert pass, tasks have different data
x1 = -47.07060164537937
x2 = 8.27945078325602
Traceback (most recent call last):
  File "/Users/brando/anaconda3/envs/automl-meta-learning/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-676708ed0b75>", line 54, in <module>
    assert(x1.sum() == x2.sum())
AssertionError

tristandeleu commented 4 years ago

This is due to two things (possibly three):

When you call enumerate(dataloader) multiple times, I don't think this resets the iterable. So this effectively continues going through the tasks, and you are sampling new tasks.
The np_random for the dataset is used to sample data, so if you create multiple dataloaders without resetting the dataset, this will only move the np_random forward and you'll sample different data (even though the tasks might be the same).
(Maybe) You are using multiple processes, which might not guarantee determinism.

The best test you can do is to run your code twice (ensuring that shuffle=False in the dataloader). Something like

from torchmeta.toy.helpers import sinusoid
from torchmeta.utils.data import BatchMetaDataLoader

batch_size = 16
shots = 5
test_shots = 15

# Seed the dataset with `seed = 0`
dataset = sinusoid(shots=shots, test_shots=test_shots, seed=0)
# `num_workers = 0` to avoid stochasticity of multiple processes
dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
                                 shuffle=False, num_workers=0)

batch = next(iter(dataloader))

inputs, _ = batch['train']
print(f'Sum of inputs: {inputs.sum()}')

If you run the code twice, you should get the same result (for me it's Sum of inputs: 1.211366437302428). And then you can play with this code and remove parts of it (e.g. seeding the dataset, shuffle=True in dataloader, num_workers > 0) to see that running the code twice gives you different outputs.

That random __hash__ trick I mentioned above makes it so that it uses a different random seed every time you call ClassSplitter, which means that this will give you different outputs if you call the code twice (even the code above).

brando90 commented 4 years ago

This is due to two things (possibly three):

When you call enumerate(dataloader) multiple times, I don't think this resets the iterable. So this effectively continues going through the tasks, and you are sampling new tasks.

The np_random for the dataset is used to sample data, so if you create multiple dataloaders without resetting the dataset, this will only move the np_random forward and you'll sample different data (even though the tasks might be the same).

(Maybe) You are using multiple processes, which might not guarantee determinism.

The best test you can do is to run your code twice (ensuring that shuffle=False in the dataloader). Something like
from torchmeta.toy.helpers import sinusoid
from torchmeta.utils.data import BatchMetaDataLoader

batch_size = 16
shots = 5
test_shots = 15

# Seed the dataset with `seed = 0`
dataset = sinusoid(shots=shots, test_shots=test_shots, seed=0)
# `num_workers = 0` to avoid stochasticity of multiple processes
dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
                                 shuffle=False, num_workers=0)

batch = next(iter(dataloader))

inputs, _ = batch['train']
print(f'Sum of inputs: {inputs.sum()}')
If you run the code twice, you should get the same result (for me it's Sum of inputs: 1.211366437302428). And then you can play with this code and remove parts of it (e.g. seeding the dataset, shuffle=True in dataloader, num_workers > 0) to see that running the code twice gives you different outputs.

That random __hash__ trick I mentioned above makes it so that it uses a different random seed every time you call ClassSplitter, which means that this will give you different outputs if you call the code twice (even the code above).

I don't think I see the random hash trick working for me. Is it working for you?


from torchmeta.toy.helpers import sinusoid
from torchmeta.utils.data import BatchMetaDataLoader

def random_hash():
    return random.randrange(1 << 32)

batch_size = 16
shots = 5
test_shots = 15

# Seed the dataset with `seed = 0`
dataset = sinusoid(shots=shots, test_shots=test_shots, seed=0)
dataset.__hash__ = random_hash
# `num_workers = 0` to avoid stochasticity of multiple processes
dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
                                 shuffle=False, num_workers=0)

batch = next(iter(dataloader))

inputs, _ = batch['train']
print(f'Sum of inputs: {inputs.sum()}')

brando90 commented 4 years ago

Anyway, it seems I don't actually need your random hash trick (@tristandeleu please confirm me if this is right), what I want is randomness not determinism (of the data not the tasks/classes). Just removing the seed input is enough (I also set num workers >0 since that's the setting closest to my real code).

Output to confirm this:

from torchmeta.toy.helpers import sinusoid
  ...: from torchmeta.utils.data import BatchMetaDataLoader
  ...: 
  ...: batch_size = 16
  ...: shots = 5
  ...: test_shots = 15
  ...: 
  ...: dataset = sinusoid(shots=shots, test_shots=test_shots)
  ...: # `num_workers = 0` to avoid stochasticity of multiple processes
  ...: dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
  ...:                                  shuffle=False, num_workers=4)
  ...: 
  ...: batch = next(iter(dataloader))
  ...: 
  ...: inputs, _ = batch['train']
  ...: print(f'Sum of inputs: {inputs.sum()}')
  ...: 
Sum of inputs: -34.956605198013364
from torchmeta.toy.helpers import sinusoid
  ...: from torchmeta.utils.data import BatchMetaDataLoader
  ...: 
  ...: batch_size = 16
  ...: shots = 5
  ...: test_shots = 15
  ...: 
  ...: dataset = sinusoid(shots=shots, test_shots=test_shots)
  ...: # `num_workers = 0` to avoid stochasticity of multiple processes
  ...: dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
  ...:                                  shuffle=False, num_workers=4)
  ...: 
  ...: batch = next(iter(dataloader))
  ...: 
  ...: inputs, _ = batch['train']
  ...: print(f'Sum of inputs: {inputs.sum()}')
  ...: 
Sum of inputs: 38.43287504101916

tristandeleu commented 4 years ago

The random hash trick I was talking about would be at the level of the Task (see also here for the parent class), so this would require rewriting the Sinusoid/SinusoidTask. I don't think there is an easy way to monkeypatch the existing dataset to add the random hash.

If having seed=None does the trick for you that's good! I originally thought what you wanted to do was orthogonal to not having a fixed seed, which would have required the random hash trick.

brando90 commented 4 years ago

The random hash trick I was talking about would be at the level of the Task (see also here for the parent class), so this would require rewriting the Sinusoid/SinusoidTask. I don't think there is an easy way to monkeypatch the existing dataset to add the random hash.

If having seed=None does the trick for you that's good! I originally thought what you wanted to do was orthogonal to not having a fixed seed, which would have required the random hash trick.

thanks for all the help trist! I really appreciate it.

Hopefully last question (just to make sure it is all good), what did you have in mind the hash trick would solve? (or what I was trying to do)

again thanks for your willingness to discuss and for your great dataloader!

tristandeleu commented 4 years ago

The random hash trick would still allow you to do task reproducibility, where if you run the same code twice with the random seed fixed, you'll get the same tasks (e.g. same amplitudes/phases for sinusoid). The key difference is that

With the current implementation, the data chosen for the training and test datasets for a task will be the same if you run the code twice (regardless whether shuffle=True in dataset), to ensure that you have complete reproducibility of your results (if you run the algorithm twice using the same seed, you'll get the same results).
With the random hash, this means that the data (at least the split used for the training/test datasets) for the same task will be different. In that case, there would be an advantage to using a larger num_samples_per_task than num_shots + num_test_shots.

Note that this is when you run the same code twice, not if you call the same task twice within your code. Something I overlooked, and I corrected in https://github.com/tristandeleu/pytorch-meta/issues/69#issuecomment-728950940 is that if you call the same task twice in your code, you do get different data because np_random is moving forward.