How to use WeightedRandomSampler from PyTorch or write custom sampler

open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Apache License 2.0

8.23k stars 2.61k forks source link

cfg.train_dataloader.sampler = dict( type='WeightedRandomSampler', shuffle=True, weights=sample_weights, # 1 weight for each sample in the dataset num_samples=len(sample_weights), replacement=True # True for oversampling )

I think I may have found a solution using torch.multinomial.

from mmengine.registry import DATA_SAMPLERS
from mmengine.dataset.sampler import InfiniteSampler
import torch
from collections.abc import Sized
from typing import Iterator, Optional

@DATA_SAMPLERS.register_module()
class WeightedInfiniteSampler(InfiniteSampler):
    def __init__(self,
                 dataset: Sized,
                 weights: torch.Tensor,
                 shuffle: bool = True,
                 seed: Optional[int] = None) -> None:
        super().__init__(dataset=dataset, shuffle=shuffle, seed=seed)
        self.weights = weights

    # We override this method and yield with torch.multinomial:
    def _infinite_indices(self) -> Iterator[int]:
        g = torch.Generator()
        g.manual_seed(self.seed)
        while True:
            if self.shuffle:
                # Weighted sampling
                yield from torch.multinomial(self.weights, self.size, replacement=True, generator=g).tolist()
            else:
                yield from torch.arange(self.size).tolist()

And then simply in the config:

sampler = dict(
    type='WeightedInfiniteSampler',
    shuffle=True,
    weights=sample_weights,
    seed=seed,
    dataset=cfg.train_dataloader.dataset
)

cfg.train_dataloader.sampler = sampler

With this sampler, the training is running and the loss is more stable. I was not yet able to verify if the sampler actually draws the samples as intended, if anyone knows a good way to do this, please comment.

Also, I am new to MMSegmentation so if this can be done better/more efficiently, please let me know!

open-mmlab / mmsegmentation

How to use WeightedRandomSampler from PyTorch or write custom sampler #3104