Detection references are needlessly transforming masks

NicolasHug commented 1 year ago

Something we realized today with @pmeier: even for pure detection tasks where masks aren't needed, the detection training references are still using the masks from COCO, which means that:

those masks are being decoded into images
those masks get transformed all the time e.g. here

Both these things are completely wasteful since masks aren't needed for detection tasks. According to some simple benchmark this significantly hurts performance.

(Not sure if that applies to Keypoints too, would need to check)

NicolasHug commented 1 year ago

We can probably address this issue by simply relying on the v2 wrapper for Coco on the references - regardless of whether v2 transforms are being used. https://github.com/pytorch/vision/issues/7494 and https://github.com/pytorch/vision/pull/7488 show that the v2 COCO wrapper is ~20% faster than the one we have on the references, and it supports removing masks natively which should lead to further improvements

pmeier commented 1 year ago

I can send a PR.

pytorch / vision

Detection references are needlessly transforming masks #7489