Closed yeliudev closed 1 year ago
Hi @yeliudev in downstream fine-tuning, we do not derive any addition labels. We use the original annotations. Label deriving is only been used during pretraining corpus creation.
@QinghongLin Thanks for your reply! Since additional labels are not used, are loss_s_inter
and loss_s_intra
also discarded? It seems that we do not know which clips have lower saliency scores than saliency_pos_labels
.
Oh Sorry for confusion. Let me clarify it.
In Tab. 3, all three losses are used, we can derive saliency score by their manual interval windows e.g., inside is greater than outside, but we don't know the exact saliency score number. In this case, we use the original interval for saliency and provide supervision for three losses. We do not use CLIP teacher for them to get exactly saliency scores.
it mean, the three losses can be flexible used with or without exactly saliency score.
I see. Seems like the downstream tasks can still benefit from this weak supervision. That is interesting. Thank you again for your kind response!
You are welcome. We have provide such ablation studies in our supplement, you can take a lot for this.
Hi @QinghongLin, many thanks for sharing this great work! I was wondering when fine-tuning UniVTG on downstream datasets without curve (highlight) labels (e.g., NLQ, Charades-STA, TACoS), did you still use "CLIP teacher" method to obtain pseudo labels? In other word, are the results of
UniVTG
andUniVTG w/PT
in Table 3 obtained by using pseudo highlight labels?