Open NicolasHug opened 1 year ago
We can probably address this issue by simply relying on the v2 wrapper for Coco on the references - regardless of whether v2 transforms are being used. https://github.com/pytorch/vision/issues/7494 and https://github.com/pytorch/vision/pull/7488 show that the v2 COCO wrapper is ~20% faster than the one we have on the references, and it supports removing masks natively which should lead to further improvements
I can send a PR.
Something we realized today with @pmeier: even for pure detection tasks where masks aren't needed, the detection training references are still using the masks from COCO, which means that:
Both these things are completely wasteful since masks aren't needed for detection tasks. According to some simple benchmark this significantly hurts performance.
(Not sure if that applies to Keypoints too, would need to check)