What is the difference between baseline and DECOLA phase 1?

Sorry for delayed response. The difference between baseline and DECOLA phase 1 is best illustrated in here, or in page 6 of our paper.

To highlight the difference, "baseline" trains as a standard object detector for multi-class objective with DeformableDETR architecture, except we replace the final classification layer with CLIP text embedding. In DECOLA Phase 1, we condition object detection pipeline with text and train each conditioned detection with binary objective.

Hope this explanation helps.

janghyuncho / DECOLA

What is the difference between baseline and DECOLA phase 1? #5