The reason why using theshold (0.5) in cal_CIoU

hche11 / Localizing-Visual-Sounds-the-Hard-Way

Localizing Visual Sounds the Hard Way

Apache License 2.0

72 stars 14 forks source link

Hi!

Why do you use the theshold (0.5) in cal_CIoU, although the training doesn't give any information about the 0.5? In other words, is it just from the hyp-param tunning, or reasoned from mathematical properties behind the contrastive loss ?

The reason what I'm asking is that recent papers Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning , A Closer Look at Weakly-Supervised Audio-Visual Source Localization use relative prediction, which always choose the 50% region as the prediction results, without any thresholdm so I just become curious :)

Hi, I think i can answer your question, you can find the define of ciou in this paper: "Learning to Localize Sound Source in Visual Scenes", they wrote in Results and analysis as:

hche11 / Localizing-Visual-Sounds-the-Hard-Way

The reason why using theshold (0.5) in cal_CIoU #14