How to run captioning task on my own video datasets?

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

https://arxiv.org/abs/2002.06353

MIT License

339 stars 54 forks source link

Open Kevinkaiyan opened 2 years ago

Kevinkaiyan commented 2 years ago

Hi, Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?

ArrowLuo commented 2 years ago

Hi @17321010162, plz see here.