open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.41k stars 373 forks source link

[Feature]: For Music - VALL-E transformer RAG (and other embedding solutions) #170

Closed bennmann closed 1 month ago

bennmann commented 5 months ago

Is your feature request related to a problem? Please describe.

The problem is I want more of the specific kind of music I already own.

Describe the solution you'd like

It would be good for the kwargs to include user owned music, which can be embedded (like RAG for LLM transformer) into a Text prompt. "Using the music provided, please generate a song in the style of $$RAG_EMBEDDING_HERE"

Describe alternatives you've considered

There are no alternatives like this for music yet.

Additional context

HuggingFace sentence-transformers page for LLM equivalent: https://huggingface.co/sentence-transformers

RMSnow commented 4 months ago

Hi @bennmann, it seems like you are looking for a Text-to-Music function? We have been devoting to developing this task. Hope we can release in this year. Before that, you are welcome to try our TTA model!