microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

What's mean of the 'step_size=5' in modeling.py #23

Closed saicoco closed 3 years ago

saicoco commented 3 years ago

does the 'step_size=5' mean that one video with five captions?

https://github.com/microsoft/UniVL/blob/main/modules/modeling.py#L346

ArrowLuo commented 3 years ago

Hi @saicoco, it is a trick to reduce the cost of GPU memory. The sequence_output is divided into many parts according to the step_size=5, then the for loop will process each part of them, finally, the similarity matrix will be gathered by torch.cat() (#L374).

saicoco commented 3 years ago

Got it