Open Sunjuhyeong opened 1 year ago
Hi!
Why do you use the theshold (0.5) in cal_CIoU, although the training doesn't give any information about the 0.5? In other words, is it just from the hyp-param tunning, or reasoned from mathematical properties behind the contrastive loss ?
The reason what I'm asking is that recent papers Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning , A Closer Look at Weakly-Supervised Audio-Visual Source Localization use relative prediction, which always choose the 50% region as the prediction results, without any thresholdm so I just become curious :)
Hi, I think i can answer your question, you can find the define of ciou in this paper: "Learning to Localize Sound Source in Visual Scenes", they wrote in Results and analysis as:
Hi!
Why do you use the theshold (0.5) in cal_CIoU, although the training doesn't give any information about the 0.5? In other words, is it just from the hyp-param tunning, or reasoned from mathematical properties behind the contrastive loss ?
* The reason what I'm asking is that recent papers Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning , A Closer Look at Weakly-Supervised Audio-Visual Source Localization use relative prediction, which always choose the 50% region as the prediction results, without any thresholdm so I just become curious :)