This is a implementation of LP-MusicCaps: LLM-Based Pseudo Music Captioning. This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning
LP-MusicCaps: LLM-Based Pseudo Music Captioning
SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam
To appear ISMIR 2023
are available online for future research. example of dataset in notebook
To run this project locally, follow the steps below:
Install python and PyTorch:
Other requirements:
cd lpmc/llm_captioning
python run.py --prompt {writing, summary, paraphrase, attribute_prediction} --tags <music_tags>
Replace beatbox, finger snipping, male voice, amateur recording, medium tempo
.
tag_to_caption generation writing
results:
query:
write a song description sentence including the following attributes
beatbox, finger snipping, male voice, amateur recording, medium tempo
----------
results:
"Experience the raw and authentic energy of an amateur recording as mesmerizing beatbox rhythms intertwine with catchy finger snipping, while a soulful male voice delivers heartfelt lyrics on a medium tempo track."
cd demo
python app.py
# or
cd lpmc/music_captioning
wget https://huggingface.co/seungheondoh/lp-music-caps/resolve/main/transfer.pth -O exp/transfer/lp_music_caps
python captioning.py --audio_path ../../dataset/samples/orchestra.wav
{'text': "This is a symphonic orchestra playing a piece that's riveting, thrilling and exciting.
The peace would be suitable in a movie when something grand and impressive happens.
There are clarinets, tubas, trumpets and french horns being played. The brass instruments help create that sense of a momentous occasion.",
'time': '0:00-10:00'}
{'text': 'This is a classical music piece from a movie soundtrack.
There is a clarinet playing the main melody while a brass section and a flute are playing the melody.
The rhythmic background is provided by the acoustic drums. The atmosphere is epic and victorious.
This piece could be used in the soundtrack of a historical drama movie during the scenes of an army marching towards the end.',
'time': '10:00-20:00'}
{'text': 'This is a live performance of a classical music piece. There is a harp playing the melody while a horn is playing the bass line in the background.
The atmosphere is epic. This piece could be used in the soundtrack of a historical drama movie during the scenes of an adventure video game.',
'time': '20:00-30:00'}
Checking lpmc/llm_captioning
and lpmc/music_captioning
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
We would like to thank the WavCaps for audio-captioning training code and deezer-playntell for contents based captioning evaluation protocol. We would like to thank OpenAI for providing the GPT-3.5 Turbo API, which powers this project.
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.
@article{doh2023lp,
title={LP-MusicCaps: LLM-Based Pseudo Music Captioning},
author={Doh, SeungHeon and Choi, Keunwoo and Lee, Jongpil and Nam, Juhan},
journal={arXiv preprint arXiv:2307.16372},
year={2023}
}