Closed lucasnewman closed 1 year ago
Great job Lucas! I'll take a look later this week; I'm about to dive back into the TTS field in August and finish a bunch of repos.
@lucasnewman are you doing this for work? for a company in SF perhaps?
@lucasnewman are you doing this for work? for a company in SF perhaps?
Yep, it's part of an exploration I'm doing for work and also just advancing my understanding of the SOTA along the way.
@lucasnewman cool, maybe my dog and I will run into you :laughing: we live in the mission
@lucasnewman which company do you work for? just curious if it is yet another TTS company (been contacted by like 3 so far)
@lucasnewman which company do you work for? just curious if it is yet another TTS company (been contacted by like 3 so far)
Ha, not at all, I work for Future (you will probably be confused 😅). I'm over in Noe so not too far away!
@lucasnewman haha yea i am confused :laughing: you automating the personal trainer with some deep fake? nice! Vaswani lives in Noe Valley haha (great neighborhood)
@lucasnewman haha yea i am confused 😆 you automating the personal trainer with some deep fake? nice! Vaswani lives in Noe Valley haha (great neighborhood)
It's not really to replace the humans, but more around personalizing the other audio aspects of what we do — I find the vast majority of deep fakes still fully in the uncanny valley. I see Vaswani at La Lucha on Sanchez all the time, although I don't know him. Small world for sure!
I'm going to close this one in favor of https://github.com/lucidrains/spear-tts-pytorch/pull/4, since that has all the changes here and more support for backtranslation. 🙏
I cribbed a bunch of this from
SemanticTransformerTrainer
in audiolm-pytorch and added a notebook to demonstrate it works. The loss converges at a 60% mask rate as used in the paper on a subset of LibriTTS. I'm happy to make changes, just let me know!