microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

How to run captioning task on my own video datasets? #28

Open Kevinkaiyan opened 2 years ago

Kevinkaiyan commented 2 years ago

Hi, Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?

ArrowLuo commented 2 years ago

Hi @17321010162, plz see here.