microsoft / SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
https://arxiv.org/abs/2111.13196
MIT License
238 stars 34 forks source link

Do the input lengths have to be fixed during training? #44

Open riariam opened 1 year ago

riariam commented 1 year ago

Thank you for your contribution. I'm curious about whether fixed frame rate images (such as the 32-frame inputs in your example) are the only inputs that can be used during the training phase, or if inputs of any length can be used. Thank you!