tanghaoyu258 / ACRM-for-moment-retrieval

MIT License
27 stars 3 forks source link

how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code? #3

Open TAY-985 opened 2 years ago

TAY-985 commented 2 years ago

Hello, I have two questions:

  1. in a bach ,How do you load video features of different lengths into a tensor? padding them to a max length? if so, what is the max length?
  2. Does the frame in the paper correspond to a clip of the video in the code , that is, corresponds to a continuous video frame ?
tanghaoyu258 commented 2 years ago

Hi. 1. the function rnns.pad_sequence in line 27 of ./data/collate_batch.py can realize this demand (see def pad_sequence in ./utils/rnns.py for detail). Actually, it is realized by nn.utils.rnn.pad_sequence, a official func of pytorch. For a batch data, this func could pad zero to all tensors, so that the length of all tensor will be the same as the longest ones in this batch. image

2.yes, the frame in the paper is actually corresponding to a continuous video frame. It is commonly used in the VML task, since the C3D or I3D extractors are always adopted to encode the video at first, which embeds 8/16 continuous frame to a feature vector.