pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.06k stars 6.93k forks source link

3D NMS and RoiAlign for volumetric data #2402

Open mibaumgartner opened 4 years ago

mibaumgartner commented 4 years ago

🚀 Feature

3D data gains more and more popularity inside the deep learning community. As a consequence it would be great to have a unified 3D NMS and 3D ROI Align for future and current projects like MONAI .

Motivation

Information added from @mjorgecardoso Medical imaging is a huge field of research, with conferences such as ISMRM (5k+ attendees), MICCAI (2.5k+), ISBI (1.5k+). Volumetric neural network operations (convolutions, pooling, etc), are common and supported in PyTorch (see here https://pytorch.org/docs/master/generated/torch.nn.Conv3d.html).

Spatial dimensions summarised: N = batch size, C = channels, H = height, W = width, D = depth / T = time

Typically found in 2D: [N, C, H, W]

Typically found in 2d + time (video): [N, C, T, H, W] Expected behaviour: operations are only applied along the spatial dimensions (H, W) and NOT along T

Typically found in 3d (volumetric): [N, C, D, H, W] (sometimes also [N, C, H, W, D] as in medicaldetectiontoolkit) Expected behaviour: operations are applied along all spatial dimensions (D,H,W)

Pitch

Add support for NMS and RoiAlign for volumetric data and define the right conventions and proper documentation to make clear which function needs to be used in which case.

For backward compatibility nms and roialign should be kept as an alias for their plain 2d counterparts. Moving forward, there could be two functions nms2d and nms3d (like typically found in pytorch e.g. Conv2d and Conv3d). I'm not quite sure what the optimal way of handling/naming the video case is (maybe a flag inside the 3d versions?).

Alternatives

Additional context

https://github.com/pytorch/vision/pull/2337 https://github.com/pytorch/vision/issues/1678 @pfjaeger

naga-karthik commented 2 years ago

Hello, I am wondering what's the status of this issue? Are 3D NMS and 3D ROI Align going to be implemented in future version of torchvision anytime soon? As the OP mentioned, having access to 3D versions of the above ops would make it convenient to train models on volumetric (medical) data. Thanks!

datumbox commented 2 years ago

@naga-karthik Thanks for the interest. Right now we don't have the bandwidth to investigate and implement the proposed features. We are a small team and we are currently tackling other more high-priority issues (new Datasets API, new Transforms API etc). Rest assured we will definitely review this on the next planning session.

etasnadi commented 5 months ago

Dear All,

If anyone is considering to implement this in torchvision, I have a working 3D RoiAlign kernel implemented in Tensnorflow that could be directly ported back into PyTorch. You can pull the 3D kernels from here: https://github.com/etasnadi/roi_align_3D.

etasnadi commented 5 months ago

Might worth considering https://github.com/TimothyZero/MedVision/tree/main for the torch version also.