custom dataset shuffle - Githubissues

ambekarsameer96 commented 2 years ago

Hi if I am using the following for the custom dataset, does it ensure that the training samples across all classes are being shuffled for every iteration in the loop as shown below? (Since there is no transform for shuffling)

train_dataset = l2l.data.MetaDataset(trainset)
    transforms = [
        l2l.data.transforms.NWays(train_dataset, ways),
        l2l.data.transforms.KShots(train_dataset, 5*shots),
        l2l.data.transforms.LoadData(train_dataset),
        l2l.data.transforms.RemapLabels(train_dataset),

    ]
    taskset = l2l.data.TaskDataset(train_dataset, transforms, num_tasks = 1000)

After this, I directly make use of a loop like this: (Since task.sampler() is not available for custom dataset)


for iter in max_iterations:
    for counter, (inputs, targets ) in enumerate(taskset):
        maml.clone()
        eval_error , eval_accuracy = fast_adapt(data, labels, learner, loss, adaptation_steps, shots, ways, device)
        if counter==max_tasks:
            break

ambekarsameer96 commented 2 years ago

@seba-1511 can you please take a look.

seba-1511 commented 2 years ago

Yes, correct since NWays and KShots sample their data randomly (akin to shuffling, but not quite the same).

ambekarsameer96 commented 2 years ago

Hello @seba-1511, Thanks for the repo and the examples. I have a doubt about maml-miniimagenet example -

# Separate data into adaptation/evalutation sets adaptation_indices = np.zeros(data.size(0), dtype=bool) adaptation_indices[np.arange(shots*ways) * 2] = True evaluation_indices = torch.from_numpy(~adaptation_indices) adaptation_indices = torch.from_numpy(adaptation_indices) adaptation_data, adaptation_labels = data[adaptation_indices], labels[adaptation_indices] evaluation_data, evaluation_labels = data[evaluation_indices], labels[evaluation_indices]

From- https://github.com/learnables/learn2learn/blob/f099ddc9ce0c10cff901ecb1acee2838d171272e/examples/vision/maml_miniimagenet.py#L99

Can we use sklearn.model_selection.train_test_split with stratify instead which allows to split the samples based on the classes that is it ensures that every class has at least one sample in train and test split?

learnables / learn2learn

custom dataset shuffle #338