Open mibaumgartner opened 4 years ago
Hello, I am wondering what's the status of this issue? Are 3D NMS and 3D ROI Align going to be implemented in future version of torchvision anytime soon? As the OP mentioned, having access to 3D versions of the above ops would make it convenient to train models on volumetric (medical) data. Thanks!
@naga-karthik Thanks for the interest. Right now we don't have the bandwidth to investigate and implement the proposed features. We are a small team and we are currently tackling other more high-priority issues (new Datasets API, new Transforms API etc). Rest assured we will definitely review this on the next planning session.
Dear All,
If anyone is considering to implement this in torchvision, I have a working 3D RoiAlign kernel implemented in Tensnorflow that could be directly ported back into PyTorch. You can pull the 3D kernels from here: https://github.com/etasnadi/roi_align_3D.
Might worth considering https://github.com/TimothyZero/MedVision/tree/main for the torch version also.
🚀 Feature
3D data gains more and more popularity inside the deep learning community. As a consequence it would be great to have a unified 3D NMS and 3D ROI Align for future and current projects like MONAI .
Motivation
Information added from @mjorgecardoso Medical imaging is a huge field of research, with conferences such as ISMRM (5k+ attendees), MICCAI (2.5k+), ISBI (1.5k+). Volumetric neural network operations (convolutions, pooling, etc), are common and supported in PyTorch (see here https://pytorch.org/docs/master/generated/torch.nn.Conv3d.html).
Spatial dimensions summarised: N = batch size, C = channels, H = height, W = width, D = depth / T = time
Typically found in 2D: [N, C, H, W]
Typically found in 2d + time (video): [N, C, T, H, W] Expected behaviour: operations are only applied along the spatial dimensions (H, W) and NOT along T
Typically found in 3d (volumetric): [N, C, D, H, W] (sometimes also [N, C, H, W, D] as in medicaldetectiontoolkit) Expected behaviour: operations are applied along all spatial dimensions (D,H,W)
Pitch
Add support for NMS and RoiAlign for volumetric data and define the right conventions and proper documentation to make clear which function needs to be used in which case.
For backward compatibility nms and roialign should be kept as an alias for their plain 2d counterparts. Moving forward, there could be two functions nms2d and nms3d (like typically found in pytorch e.g. Conv2d and Conv3d). I'm not quite sure what the optimal way of handling/naming the video case is (maybe a flag inside the 3d versions?).
Alternatives
Additional context
https://github.com/pytorch/vision/pull/2337 https://github.com/pytorch/vision/issues/1678 @pfjaeger