Add NMS version which can do confidence based thresholding #42370

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

https://pytorch.org/vision

BSD 3-Clause "New" or "Revised" License

16.22k stars 6.95k forks source link

Add NMS version which can do confidence based thresholding #42370 #2552

Open anujgupt-github opened 4 years ago

anujgupt-github commented 4 years ago

🚀 Feature

Add NMS version which takes confidence threshold and prunes out low confidence scores from the entire prediction.

Motivation

Motivation is that if Pytorch NMS requires only boxes which cross confidence threshold, it means that the filtering should be done apriori. This filtering usually results in an ONNX NONZERO operator, and if using GLOW, NONZERO can't be supported, owing to the dynamic shape of the output tensor. ONNX supports NonMaxSuppression node, which does exactly the same thing. If pytorch also supports this, conversion of Pytorch NMS -> ONNX NMS becomes much easier.

Pitch

Add an additional input to NMS operator (confidence_threshold). New NMS operator takes entire model prediction as an input, and based on the confidence_threshold, filters out boxes lower than the threshold before moving to IOU.

fmassa commented 4 years ago

Hi,

What is the advantage of doing it inside the operator itself, only for easier ONNX conversion so that we avoid potentially having dynamic shapes?

In my opinion, we should try to make the API as self-contained and simple as possible, and there are many ways that one can perform filtering (either before or after NMS).

Additionally, the current formulation for the NMS already has a dynamic shape (similar to nonzero), as the number of elements in the index is not known beforehand. It would require modifying the implementation to always return the same number of boxes as input, probably padded with zeros, plus an indicator vector that indicates if the box is activated or not.

But maybe I'm missing something?

anujgupt-github commented 4 years ago

Hi @fmassa - Indeed the ask is for including filtering based on confidence threshold inside NMS API itself to avoid NonZero like operators. You are right, the output will still need to be of a fixed shape, possibly padded by zeros in case of less bounding boxes in an inference. This naturally fits very well with the ONNX NonMaxSuppression which contains conf_threshold also, and does this entire filtering as part of the API. (filtering + IOU) If everything is consumed inside the NMS API, it eliminates the dynamic tensors, which are not supported in GLOW. Please let me know if you differ on this. Thanks

fmassa commented 4 years ago

@anujgupt-github but this NMS implementation wouldn't be used by Faster R-CNN as is, and thus there wouldn't be any benefits to users of Faster / Mask / Keypoint R-CNN, correct? Or is your request to also change the detection models to use this new version of NMS?

IMO, one of the big early advantages of PyTorch was it seamless support for dynamic shapes, which would allow for a very seamless API similar to numpy. I think an API as you mention is less intuitive for users, and the downstream uses of NMS will need to be modified (in a non-trivial way) for it to work.

I'm in favor of keeping things simple and not duplicate the API of NMS for this particular use-case, but I'm happy to discuss more

anujgupt-github commented 4 years ago

Hi @fmassa , At the moment, I wasn't alluding to changing the detection models also, but only to add a new version of NMS which can handle the conf threshold also. It is going to be an added functionality to be exercised in specific cases. Short of this, the viable options available are only to partition the models such that either the subgraph containing non-zero is separated out or modify the code to circumvent dynamic shapes. As is, running an inference model containing NMS and compiling with GLOW is not possible, and defeats the purpose of high speed H/W accelerators. The conf_threshold can be defaulted to 1, in which case legacy networks using current NMS are not impacted at all. Thoughts?

vadimkantorov commented 3 years ago

My response to motivation would just be to simplify user code, since confidence thresholding is semi-ubiquitous, but keeping it outside isn't too bad either. Another frequent post-processing is total maximum number of detections per image.

One useful thing would be to have a true batch mode: e.g. batch across images, not only across categories. Yes, there is problem on NMS returning index, but maybe it could return a mask instead or a single keep index tensor with offsets or padded index tensor (especially if a given number of detections per image is set)