An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
339
stars
54
forks
source link
How to run captioning task on my own video datasets? #28
Open
Kevinkaiyan opened 2 years ago
Hi, Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?