pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.09k stars 6.94k forks source link

Sparse masks for torchvision maskrcnn - useful for training on big images with small objects #4266

Open rutgerfick opened 3 years ago

rutgerfick commented 3 years ago

🚀 Feature

Hello everyone,

To have more efficient GPU memory management, I propose it's a nice idea to allow for sparse masks in the torchvision mask rcnn implementation.

Motivation

For tasks involving the prediction of many small objects in large images it becomes increasingly painful to a large, extremely sparse mask for each object. For example, I may have images of size 8k x 8k pixels with between 1 and up to 100 objects of interest of size around 40 x 40 pixels. In this case, the current dataloader tutorial creates by default a dense mask of size 8k x 8k x N_objects, which is extremely sparse <<1% but takes a lot of memory.

Having this feature would facilitate training mask rcnns on much larger images in this scenario.

Pitch

To allow for masks to be defined as torch sparse tensors, in addition to the usual dense tensors.

I think the only thing to be done is to adjust the way the maskrcnn loss is defined and allow it to take both dense or sparse masks, potentially with a if/else depending on the mask instance. https://github.com/pytorch/vision/blob/7d52be76c8eaf02b12338afe0822396ab3547fe2/torchvision/models/detection/roi_heads.py#L101

Alternatives

The current alternative is to cut a big image into many smaller ones to be inside the GPU memory, but this is suboptimal when the objects of interest are rare, and we want to include as many as possible hard negatives in addition to positives.

Additional context

N/A

cc @datumbox

datumbox commented 3 years ago

Thanks for the proposal. Currently the sparse tensors are in beta and not all operators are supported. Definitely worth leaving this issue open once their API becomes more stable.