linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
https://arxiv.org/abs/2005.00200
MIT License
230 stars 34 forks source link

Single-channel results on multi-channel datasets #2

Closed antoyang closed 3 years ago

antoyang commented 3 years ago

Hi, Nice work! I understand HERO is best designed for multi-channel tasks, but I am curious about how HERO performs on How2QA and TVQA when not using the subtitle. Do you have these? It'd be helpful to understand the importance of this modality on different domains (YouTube ASR vs TV subtitles). Best, Antoine Yang

linjieli222 commented 3 years ago

Thanks for your interests in our project.

We did not run experiments on TVQA and How2QA with single-channel inputs. If you are looking into single-channel performance on QA tasks specifically, you can play with the code and run these experiments yourselves. We are also interests in your findings, please share your results with us if you do run these experiments :).

In addition, TVQA paper actually have a detailed analysis on how a model performs with video-only, sutitle-only and video-subtitle inputs. If I remembers correctly, video-only performance is much poorer than video-subtitle performance.

I do think the results on DiDeMo and MSRVTT can somewhat answer your questions about the importance of the subtitle modalities on different domains. As MSRVTT is from movie domain, which is similar to TV domain in the sense that the subtitles are not exactly describing the scene. DiDeMo are built from user-generated videos, which is similar to YouTube videos in How2 datasets. Our finding is that subtitles/ASR can significantly boost model performance on DiDeMo and MSRVTT, despite that the datasets are designed for single-channel videos. I would imagine that you may have similar findings on How2QA and TVQA. The importance of subtitle/ASR may be more significant as the dataset is designed for multi-channel videos.

Thanks, Linjie

antoyang commented 3 years ago

Thanks for your thoughts, I'll have a look at that :). Also, I was wondering if you could share the val results of your best models on How2QA.

Best, Antoine Yang

linjieli222 commented 3 years ago

Hi Antoine,

Sorry about the late reply on this thread. Our predictions from the best model are released here: https://drive.google.com/drive/folders/1x0dLHIRlvQimyRuRSSHMUPTE5HouAOAi?usp=sharing

The finetuning code and feature is currently under cleaning by @ych133 , we will let you know when it is out.

Thanks, Linjie

linjieli222 commented 3 years ago

Closed due to inactivity.