Will you consider adding audio features?

line / lighthouse

[EMNLP2024 Demo] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.

https://www.arxiv.org/abs/2408.02901

Apache License 2.0

91 stars 9 forks source link

Will you consider adding audio features? #28

Closed ffiioonnaa closed 2 months ago

ffiioonnaa commented 2 months ago

Hi，thanks for your works！I was wondering if you would consider adding audio features?

awkrail commented 2 months ago

@ffiioonnaa Thank you for your interest. For training, we added the audio features in the last PR #27. If you want to train Moment DETR w/ CLIP+Slowfast+Audio (PANNs), run:

PYTHONPATH=. python training/train.py --config configs/qvhighlight/clip_slowfast_pann_moment_detr_qvhighlight.yml

For inference API (and Gradio demo), we are now working on it. Next week, I think that I can implement it. Stay tuned.

awkrail commented 2 months ago

@ffiioonnaa Hi, I implemented training, inference API, and gradio demo. As reported in reproduced_results.md, adding audio features slightly improves the model's performance (but not always true). Because the difference is slight, we cannot observe the positive difference of the predicted results between CLIP+Slowfast and CLIP+Slowfast+PANNs (audio).