Closed moonlitt closed 3 years ago
Does v_0 mean you take 8 frames to get v_0 embedding and eventually take 4 times 8 frames to get [v_0 , v_S , v_2S , v_M] ?
Hi, yes exactly. You can set this running by setting the argument:
test.py --sliding_window_stride NUM_FRAMES
Where NUM_FRAMES is the stride in frames. The video embedding is average over all v embeddings
Hi, I am confused about the description of frame sampling while testing: 'The values for i are determine using a stride S, resulting in an array of video embeddings v = [v_0 , v_S , v_2S , v_M ].' Could you please take MSRVTT as an example to show us how to sample frames in testing phase? Thanks a lot