janghyuncho / DECOLA

Code release for "Language-conditioned Detection Transformer"
https://arxiv.org/abs/2311.17902
82 stars 4 forks source link

What is the difference between baseline and DECOLA phase 1? #5

Closed Dwrety closed 3 months ago

Dwrety commented 8 months ago

What is the difference between baseline and DECOLA phase 1?

janghyuncho commented 7 months ago

Sorry for delayed response. The difference between baseline and DECOLA phase 1 is best illustrated in here, or in page 6 of our paper.

To highlight the difference, "baseline" trains as a standard object detector for multi-class objective with DeformableDETR architecture, except we replace the final classification layer with CLIP text embedding. In DECOLA Phase 1, we condition object detection pipeline with text and train each conditioned detection with binary objective.

Hope this explanation helps.