m-bain / frozen-in-time

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
https://arxiv.org/abs/2104.00650
MIT License
348 stars 44 forks source link

Frame sampling in test phase #30

Closed moonlitt closed 3 years ago

moonlitt commented 3 years ago

Hi, I am confused about the description of frame sampling while testing: 'The values for i are determine using a stride S, resulting in an array of video embeddings v = [v_0 , v_S , v_2S , v_M ].' Could you please take MSRVTT as an example to show us how to sample frames in testing phase? Thanks a lot

moonlitt commented 3 years ago

Does v_0 mean you take 8 frames to get v_0 embedding and eventually take 4 times 8 frames to get [v_0 , v_S , v_2S , v_M] ?

m-bain commented 3 years ago

Hi, yes exactly. You can set this running by setting the argument: test.py --sliding_window_stride NUM_FRAMES

Where NUM_FRAMES is the stride in frames. The video embedding is average over all v embeddings