Closed stwerner97 closed 1 year ago
Hi, Thanks for your interest.
The baseline in the transfer learning experiment in the paper denotes training neck and head modules from scratch, and the backbone is pre-trained. We made a comparison with a fully pre-trained detector as you mentioned, but the results are much worse.
I agree with you that the transfer learning results may be the most important metric for pre-training methods. However, since AlignDet only pre-trains small parameters (neck & head) on small-size data (instead of ImageNet, etc), it can not learn enough prior knowledge for good transfer learning as strong as fully-supervised pre-trained methods.
As for scaling AlignDet on larger parameters (instead of neck and head) on large datasets, I'm not sure how this works, maybe we can wait for following papers~
Hi, thanks for the insightful response! 😊
Have you evaluated a pipeline along the lines of AlignDet pre-training > COCO supervised finetuning > Downstream task?
I wonder if self supervised pre-training achieves any gains in such a setting, as with self-supervised pre-training, we could use more diverse datasets jointly (e.g., openimages, objects364, coco) with disjunct label spaces that might transfer better to downstream tasks due to a lower domain gap.
We did not evaluate the pipeline following AlignDet pre-training > COCO supervised fine-tuning > Downstream task
, but you can try this good idea with our released AlignDet pre-training > COCO supervised fine-tuning
checkpoints.
FYI, Just to make an irresponsible guess, I think this would bring additional performance improvements.
Hi, thanks for the great work! 😊
I am interested in how well weights pre-trained with AlignDet transfer to downstream tasks, which I consider to be an important reason for self-supervised detection pre-training. A standard procedure when transfer learning to downstream tasks is to pre-train a detector on MS COCO, OpenImagesv7, or Objects365 and use these weights to initialize all modules except for the classification head.
Have you tried out how AlignDet matches up against supervised pre-training on small to medium sized datasets? In your transfer learning experiment on Pascal VOC, do you compare AlignDet against a detector with supervised pre-training for all modules (except the classification head) or does the baseline train all modules but the backbone from scratch?