Unfair comparison - using inconsistent pre training weights from the past

suojinhui commented 5 months ago

Why use COCO pre-trained Mask R-CNN w/Res50 FPN backbone and 3x schedule for pre training weights? In almost all previous work, We used imagenet pre-trained weights to initialize the backbone (VGG16/resnet50 101). Your high performance is entirely due to better pre-training weights.

I agree with your proposal regarding the setup of source only and Oracle discussed in your paper. You also mentioned using updated backbones to improve performance.

But none of these are enough to support you using pre training weights that are different from everyone else before.

You could have used the same pre training weights as in previous work (even with different backbones), but you didn't. Your paper discussed how to fairly compare to promote domain adaptive object detection, but the SOTA performance you achieved is even more unfair to other works, which is also the most ironic point.

justinkay commented 5 months ago

Hi @suojinhui, thanks for reading. For all the methods we re-implemented in the paper (AT, MIC, PT, SADA, UMT), not just our own, we use the same COCO pre-trained weights as a starting point. So the comparisons are fair. These details are in the appendix section 3.1.

You can find the config files for other methods on the extras branch. For example, you can see here that our MIC and UMT configurations start from the same pretrained model (COCO pre-trained Mask R-CNN w/Res50 FPN backbone and 3x schedule) as our own: MIC: https://github.com/justinkay/aldi/blob/extras/configs/cityscapes/cityscapes_priorart/MIC-Cityscapes.yaml UMT: https://github.com/justinkay/aldi/blob/extras/configs/cityscapes/cityscapes_priorart/UMT-Cityscapes.yaml

For other methods such as AT that use a burn-in, the burn-in phase is also initialized with the weights from COCO pre-trained Mask R-CNN w/Res50 FPN backbone and 3x schedule: https://github.com/justinkay/aldi/blob/extras/configs/cityscapes/Base-RCNN-FPN-Cityscapes.yaml

Thanks for the feedback, perhaps we should specify this more clearly in the main paper.

suojinhui commented 2 months ago

Thank you for your reply and your contribution to DAOD. I would suggest you to provide the results of running the model pre-trained on a non object detection dataset as an initial model, since the domain adaptive performance is the result of validating on the target domain using only the source domain labels. Pre-training with other object detection datasets may introduce a "third domain". This may have some problems. For example, if the pre-trained model is strong enough to outperform Oracle(tgt) without training. This would make domain adaptation meaningless.

justinkay commented 2 months ago

Hi @suojinhui, that is a good idea. We have done some experiments starting with ImageNet weights instead -- I'll put those together and post them.

Regarding the extreme case you mention, we were concerned about this possibility too, however we confirmed that this is not the case for any of the benchmarks in the paper. COCO pre-trained models are worse than source-only baselines (fine-tuned from COCO weights) in all cases. One of these results is in the paper currently, but it's subtle -- in Fig 6a, the lightest-color dot represents "no burn-in", i.e. it starts with COCO weights. When evaluated on Foggy Cityscapes it scores AP@IoU=0.5 42.4, whereas the source-only model is 59.1.

Still, your point about COCO pre-training introducing a "third domain" is an interesting one, and I agree that including results starting with ImageNet weights would be informative. Will ping you when we post the additional results.

Thanks for your feedback and suggestions for improving our work!

justinkay commented 3 weeks ago

Hi @suojinhui, the ImageNet results can be seen here:

pretraining_and_vit

These are also now posted in an update to the preprint: https://arxiv.org/pdf/2403.12029 -- thank you for the suggestion.

justinkay / aldi

Unfair comparison - using inconsistent pre training weights from the past #5