homebrewltd / ichigo

Llama3.1 learns to Listen
150 stars 5 forks source link

idea: Text-to-music brainstorm August 2024 #26

Open bennmann opened 1 month ago

bennmann commented 1 month ago

Proprietary music generation is far ahead of open source (see Suno, Udio et al).

Using your encodec method, please include text-to-music with English synthetic Singing somehow. I'm not sure of the best dataset however there are some datasets which might be supplemented for this purpose.

https://www.kaggle.com/datasets/googleai/musiccaps musiccaps 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. Released under a Creative Commons CC-SA 4.0 permissive license.

One could prepare the dataset with a speech recognition run to produce a "lyrics" column for each example.

I'm broke and have no time or I would do the same with your method.

tikikun commented 1 month ago

very interesting we will consider after we introduce end-to-end pipeline

bennmann commented 1 week ago

https://openreview.net/forum?id=SRmZw7nEGW

Abstract: Audio generation is a major branch of generative AI research. Compared with prior works in this area that are commonly task-specific with heavy domain knowledge, this paper advocates building universal audio generation models that can handle various tasks in a unified manner. As recent research on large language models (LLMs) has demonstrated their strong ability to handle multiple tasks, this work presents UniAudio, an LLM-based audio generation model that supports a wide range of audio generation tasks. Based on various input conditions, such as phoneme, text description, or audio itself, UniAudio can generate speech, sound, music, and singing voice. The proposed UniAudio is built with 100k hours of multi-source open-available audio data and is scaled to 1B parameters. The audio tokenization method and language model architecture are also specifically designed for both performance and efficiency. Experimentally, UniAuido supports 11 audio generation tasks and achieves competitive results on all tasks consistently. We also show that UniAudio can support new tasks seamlessly via simple fine-tuning.

might be worth reaching out to this author