Closed jeaye closed 4 years ago
Hi Jeaye,
I'm optimistic that you could do something like this with a ddsp approach eventually, but there are several open research questions: (1) Adapting methods for generating hi-quality singing voices (2) Getting training data of the individual (3) training artist specific model (4) performing source separation to remove the vocals and add them back in.
You might have some luck using a source separation algorithm such as Spleeter to get the vocals and then label the words and train a new model conditioned on words to predict the singing.
Thanks for the thorough response, Jesse. I'll dig deeper into this and see if there are enough pieces for me to put together without needing a PhD. :grin:
What would it take to use DDSP to change the words a singer is singing in a song while still keeping the melody? So, a combination of TTS and DDSP, I would think. For example, one could feed in new lyrics (text) to an existing song like Creep (Radiohead) and have Thom say "I love feet" instead of "I'm a creep".
I think this project, and similar projects, seem the closest to actually doing this, but I haven't seen any specific mention of it. Any tips or additional info would be appreciated.