yytzsy / SCDM

Code for the paper: Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
68 stars 14 forks source link

SCDM

Code for the paper: "Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos"

requirements

Introduction

Temporal sentence grounding (TSG) in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence query. We propose a semantic conditioned dynamic modulation (SCDM) mechanism to help solve the TSG problem, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence-related video contents over time.

Download Features and Example Preprocessed Data

First, download the following files into the './data' folder:

Data Preprocessing

As denoted in our paper, we perform the temporal sentence grounding task in three datasets: Charades-STA, ActivityNet Captions, and TACoS. Before the model training and testing in these three datasets, please preprocess the data first.

Model Training and Testing

Citation

@inproceedings{yuan2019semantic,
  title={Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Liu, Wei and Zhu, Wenwu},
  booktitle={Advances in Neural Information Processing Systems},
  pages={534--544},
  year={2019}
}