how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code?

tanghaoyu258 / ACRM-for-moment-retrieval

MIT License

27 stars 3 forks source link

Hi. 1. the function rnns.pad_sequence in line 27 of ./data/collate_batch.py can realize this demand (see def pad_sequence in ./utils/rnns.py for detail). Actually, it is realized by nn.utils.rnn.pad_sequence, a official func of pytorch. For a batch data, this func could pad zero to all tensors, so that the length of all tensor will be the same as the longest ones in this batch.

2.yes, the frame in the paper is actually corresponding to a continuous video frame. It is commonly used in the VML task, since the C3D or I3D extractors are always adopted to encode the video at first, which embeds 8/16 continuous frame to a feature vector.

tanghaoyu258 / ACRM-for-moment-retrieval

how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code? #3