[Feature]: For Music - VALL-E transformer RAG (and other embedding solutions)

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

MIT License

4.41k stars 373 forks source link

Is your feature request related to a problem? Please describe.

The problem is I want more of the specific kind of music I already own.

Describe the solution you'd like

It would be good for the kwargs to include user owned music, which can be embedded (like RAG for LLM transformer) into a Text prompt. "Using the music provided, please generate a song in the style of $$RAG_EMBEDDING_HERE"

Describe alternatives you've considered

There are no alternatives like this for music yet.

Additional context

HuggingFace sentence-transformers page for LLM equivalent: https://huggingface.co/sentence-transformers

open-mmlab / Amphion