mengcaopku / LocVTP

[ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization
Apache License 2.0
38 stars 0 forks source link

About the performance of retrieval #1

Open BMEI1314 opened 2 years ago

BMEI1314 commented 2 years ago

The setting of performance of Clip4clip in Table 1 is based on VIT-B/32, while the LocVTP adopts Vit-B/16 to init the vis encoder. We want to know if it's a typo in paper

mengcaopku commented 2 years ago

We take Vit-B/16 as the visual encoder following OA-Trans. There will indeed be some unfair comparisons with Clip4clip.