About Random Sampler at the supervised setting

nysp78 commented 1 year ago

Hello,

When you are creating the dataloader for labeled data in the supervised setting, you are using a random sampler that samples a specific amount of labeled data from the total available labeled images.

num_samples = self.batch_size * 200 # for total 40k iterations with 200 epochs

train_l_loader = torch.utils.data.DataLoader(train_l_dataset, batch_size=self.batch_size, sampler=sampler.RandomSampler(data_source=train_l_dataset, replacement=True, num_samples=num_samples), drop_last=True )

Why are you using this sampler and forward pass a subset of labeled images instead of the whole amount of labeled data? Is it an effective training approach?

Thanks

lorenmt commented 1 year ago

Hello, I thought I clarified the question in another issue you posted earlier. Could you specify which part you are still confused with?

The only reason we employed RandomSampler is to have dataloaders for unlabelled and labelled data to be the same length.

nysp78 commented 1 year ago

I'm referring in the case of the supervised setting when we have only labeled data and no unlabeled. Why we also use a random sampler?

lorenmt commented 1 year ago

I see. In this case, we use the same dataloader just to have a fair comparison. (To see the effect from unlabelled data given the exact same labelled data.)

lorenmt / reco

About Random Sampler at the supervised setting #31