raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
517 stars 39 forks source link

Any plans in Anchor-free detection? #1

Closed YueLiao closed 2 years ago

YueLiao commented 2 years ago

Thx for your interesting and nice work!

I would like to know whether you conduct experiments on anchor-free detectors, e.g., a). adopting the score maps s as the centre/conner points heatmaps to obtain bounding-boxes predictions or b). employing pre-model prompting for a transformer detector directly (use text embeddings to initialize classifier), like DETR.

raoyongming commented 2 years ago

Thanks for your interest in our work.

In our experiments, we focus on studying how to better fine-tune the CLIP models and leverage the language priors. Therefore, we choose some widely used frameworks like Semantic FPN and Mask RCNN. So we didn't test the performance on anchor-free detectors or DETR. I think it may be not simple to directly use s or the text encoder to perform the object detection task since there is a significant gap between the pre-training tasks that focus on semantic information (e.g., categories) and the object localization tasks of center/conner/DETR decoder.

YueLiao commented 2 years ago

Thx for your reply~