Open Henistein opened 1 month ago
My guess is because more data always seems to be better.
So they trained on the dataset with more data (Objects365 = 2,000k images), and then fine tuned it on the smaller dataset (COCO = 328k images).
Edit: Just found this in the original paper:
We pre-train RT-DETR on the larger Objects365[35] dataset and then fine-tune it on COCO to achieve higher performance.
As stated here RT-DETR-{R18,R50,R101} were trained on Objects365 and fine-tuned on COCO dataset. Why not the opposite?
I am curious because since you already had the pretrained models on COCO why not just fine-tune on Objects365. Is there a reason for that?
Thank you in advance!