This is a PyTorch implementation of the "Convolutional Hierarchical Attention Network for Query-Focused Video Summarization", which is accepted by AAAI 2020 conference.
Note: This project is stil a work in progress
Parallel Computing Model | Simple Model Diagram |
---|---|
Here is the result video summary for the query FOOD
and HANDS
. The model generated a ~4:30 minute summary which contains clips that either have food or hands in frame from a ~4-hour long video which contains diverse scenes like library, mall, driving, shop, etc.
pip install -r requirements.txt
python main.py
The implementation and understanding of this paper is being done as part of my research progress under the guidance of Prof. Payal Prajapati.
The evaluation code is being borrowed from EgoVLPv2.
The code is inspired by CHAN implementation: https://github.com/ckczzj/CHAN