pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.95k stars 6.91k forks source link

Detection references are needlessly transforming masks #7489

Open NicolasHug opened 1 year ago

NicolasHug commented 1 year ago

Something we realized today with @pmeier: even for pure detection tasks where masks aren't needed, the detection training references are still using the masks from COCO, which means that:

Both these things are completely wasteful since masks aren't needed for detection tasks. According to some simple benchmark this significantly hurts performance.

(Not sure if that applies to Keypoints too, would need to check)

NicolasHug commented 1 year ago

We can probably address this issue by simply relying on the v2 wrapper for Coco on the references - regardless of whether v2 transforms are being used. https://github.com/pytorch/vision/issues/7494 and https://github.com/pytorch/vision/pull/7488 show that the v2 COCO wrapper is ~20% faster than the one we have on the references, and it supports removing masks natively which should lead to further improvements

pmeier commented 1 year ago

I can send a PR.