nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

About the soft token prediction #38

Closed ZCMax closed 11 months ago

ZCMax commented 11 months ago

Since the results of evaluate_bbox_by_contrast are higher than evaluate_bbox_by_span, I have a question that if the soft prediction loss is removed, what would happen to the final results while only keeping contrastive alignment loss? Is alignment loss enough for model training?

nickgkan commented 11 months ago

Hi, While we didn't thoroughly experiment on this, we had some indications that we need both losses to achieve the reported performance. It could be the case that with some different weighting one of the loss terms can be removed, but we did not extensively try that.