pytorch / glow

Compiler for Neural Network hardware accelerators
Apache License 2.0
3.2k stars 688 forks source link

Queries regarding Non Max Suppression implementation #4749

Open dpanicka opened 4 years ago

dpanicka commented 4 years ago

Looking through the NMS implementation, I have a couple of queries based on the documentation from Tensorflow V4 and ONNX NMS.

It would be good to discuss these with the original author, @ayermolo . Thanks!

ayermolo commented 4 years ago

1) Well NMS outputs indices (batch, box, class). Usually then it goes through post processing (pp) subgraph with bunch of select/gathers. With ONNX nms PP doesn't know how many are valid so gather will access all indices. If we don't fill it it's either going to be un-initialized or 0. With un-initialed it can lead to unintended behavior and access some random piece of memory in gather. If 0th box happens to be best Score then it will dominate after topK. With min box all the entries after maxOutputPerBatch will usually be disregarded or show up at the end.

2) It was to avoid broadcast. In networks I have seen it was usually replicated for layers that do selection. In ONX NMS it's not used anyway. I guess it can be gated by V4 flag.

3) I think you are referring to ones that use gather to extract values right? The graph it self is constructed with batch dim 1. There are other tests that instantiate testNonMaxSuppression, that use batch > 1.

dpanicka commented 4 years ago

Thank you for your comments!

  1. That's a good point regarding gather and the 0th box. If we could avoid unnecessary computation, that would be best since we only care about the valid boxes anyway. Will have a think about it.

  2. OK.

  3. I meant the testNonMaxSuppression without gather, here. The loop goes up to maxOutputPerClass, and that would only check the first maxOutputPerClass number of results without considering batches.

I'll try to put up a patch and it would be great if you could review it.

ayermolo commented 4 years ago

1) Yeah it's a problem. There is a fundamental issue that in GLOW dimensions are static, and NMS output is dynamic. So if you construct PP you have to assume that max is detected. 3) Ah I see. Yeah should be extended to number of batch dimensions. I think all the tests use only one class.

Sure, although I don't have permissions to approve. Someone from FB team will need to do that.