ufoym / imbalanced-dataset-sampler

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.
MIT License
2.27k stars 264 forks source link

Subset sampling entire dataset #36

Open dajopr opened 3 years ago

dajopr commented 3 years ago

Hi everyone,

I have a question concering using subsets with this sampler. According to the code it chooses samples from all entries in the parent dataset: https://github.com/ufoym/imbalanced-dataset-sampler/blob/e9dd2deca6e058771533678b29b38a60843b0a85/torchsampler/imbalanced.py#L49-L50

Shouldn't it only sample from the samples the chosen subset in dataset.indices? When I try to run _get_labels as is, I get length mismatch. Is my implementation of subset unusual or should this be changed? Only returning the labels corresponding to dataset.indices solved this problem for me:

        elif isinstance(dataset, torch.utils.data.Subset):
            return [dataset.dataset.imgs[ind][1] for ind in dataset.indices]
SilenceMonk commented 2 years ago

@tisner Thx!