snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
https://snap-research.github.io/Panda-70M/
528 stars 19 forks source link

Why do you use Euclidean distance as the metric of fts' similarity? #61

Open zhoumumu opened 4 months ago

zhoumumu commented 4 months ago

Hi! Thank you very much for your work! The preprocessing trimming code runs smoothly and it has been very helpful.

I'm curious as to why you chose to use Euclidean distance instead of cosine distance. Since the former has not been normalized to the range (0, 1), and thus difficult to select an appropriate threshold.

zhoumumu commented 4 months ago

In addition, what does this line used for? https://github.com/snap-research/Panda-70M/blob/bbae2b18f1a109cf6c2d527f1708e5213c120c3d/splitting/event_stitching.py#L107