Questions about some details?

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Other

115 stars 11 forks source link

Hi, Thank you for your great work. I have a question about the span_label normalization, in training phrase, the span_label seems normlized with video feature length: windows = torch.Tensor(windows) / (ctx_l * self.clip_len) # normalized windows in xx; while in inference phrase: spans = span_cxw_to_xx(spans) * meta["duration"], spans = torch.clamp(spans, 0, meta["duration"]). I am confused about this implementation. In my experiments, I try to normalized the span_label with video duration, the performance drops. Another question is about self.clip_len, I can't understand its function. Could you explain it？

Thanks agian!

wjun0830 / CGDETR

Questions about some details? #5