microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
336 stars 54 forks source link

Expected data format? #1

Closed mckinziebrandon closed 3 years ago

mckinziebrandon commented 3 years ago

Hello, I was trying to run the README example(s) on the youcook2 dataset. I've downloaded all the files from the youcook webpage and ran the download scripts. Reading through dataloader_youcook_caption.py seems to indicate you expect the data to be in some custom/different format. Is that correct? For example, I don't see any .pickle files in the original dataset, and none of the csv files have a column feature_file. Can you clarify the steps required to run the README?

brunokinder commented 3 years ago

Hi @mckinziebrandon ,

I have the same questions. It seems to me that the csv file has the format video_id, feature_file as for the training stages i and ii. I also think the format they expect for the captions is the same as in the training stages which means we would have to convert YouCook II format to this one. Did you have your answers?

Thanks.

ArrowLuo commented 3 years ago

@mckinziebrandon @brunokinder Hope the readme in the dataloaders folder can answer your questions.

brunokinder commented 3 years ago

That is really great! Thanks a lot @ArrowLuo !