magenta / ddsp

DDSP: Differentiable Digital Signal Processing
https://magenta.tensorflow.org/ddsp
Apache License 2.0
2.86k stars 331 forks source link

Conversion to original voice while changing lyrics #22

Closed jeaye closed 4 years ago

jeaye commented 4 years ago

What would it take to use DDSP to change the words a singer is singing in a song while still keeping the melody? So, a combination of TTS and DDSP, I would think. For example, one could feed in new lyrics (text) to an existing song like Creep (Radiohead) and have Thom say "I love feet" instead of "I'm a creep".

I think this project, and similar projects, seem the closest to actually doing this, but I haven't seen any specific mention of it. Any tips or additional info would be appreciated.

jesseengel commented 4 years ago

Hi Jeaye,

I'm optimistic that you could do something like this with a ddsp approach eventually, but there are several open research questions: (1) Adapting methods for generating hi-quality singing voices (2) Getting training data of the individual (3) training artist specific model (4) performing source separation to remove the vocals and add them back in.

You might have some luck using a source separation algorithm such as Spleeter to get the vocals and then label the words and train a new model conditioned on words to predict the singing.

jeaye commented 4 years ago

Thanks for the thorough response, Jesse. I'll dig deeper into this and see if there are enough pieces for me to put together without needing a PhD. :grin: