xwen99 / temporal_context_aggregation

Temporal Context Aggregation for Video Retrieval with Contrastive Learning, WACV 2021
https://arxiv.org/abs/2008.01334
Apache License 2.0
27 stars 3 forks source link
contrastive-learning representation-learning temporal-context-aggregation video-retrieval wacv2021

Temporal Context Aggregation for Video Retrieval with Contrastive Learning

By Jie Shao*, Xin Wen*, Bingchen Zhao and Xiangyang Xue (*: equal contribution)

This is the official PyTorch implementation of the paper "Temporal Context Aggregation for Video Retrieval with Contrastive Learning".

Introduction

In this paper, we propose TCA (Temporal Context Aggregation for Video Retrieval), a video representation learning framework that incorporates long-range temporal information between frame-level features using the self-attention mechanism.

teaser

To train it on video retrieval datasets, we propose a supervised contrastive learning method that performs automatic hard negative mining and utilizes the memory bank mechanism to increase the capacity of negative samples.

The proposed method shows a significant performance advantage (∼17% mAP on FIVR-200K) over state-of-the-art methods with video-level features, and deliver competitive results with 22x faster inference time comparing with frame-level features.

Getting Started

Requirements

Currently, we only tested the code compacity with the following dependencies:

Citation

@InProceedings{Shao_2021_WACV,
    author    = {Shao, Jie and Wen, Xin and Zhao, Bingchen and Xue, Xiangyang},
    title     = {Temporal Context Aggregation for Video Retrieval With Contrastive Learning},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2021},
    pages     = {3268-3278}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Xin Wen: im.xwen@gmail.com