slSeanWU / beats-conformer-bart-audio-captioner

PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"
Apache License 2.0
29 stars 1 forks source link
audio-captioning clotho-dataset dcase-challenge pytorch transformers

Audio Captioning with BEATs, Conformer & BART

Winning model of DCASE Challenge 2023 Task 6A, with the follow-up publication:

Install Packages

Download Dataset & Pretrained Model

Reproduce Best Model Results

(Bonus) Augmented Dataset

Our 50K mix-up caption augmentations generated by ChatGPT (see paper Section 2.3 for details) can be found at:

Acknowledgements

Our model/repository would not have been possible without the following great open-source works. Thank you so much!