lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

add a basic trainer and dataset #1

Closed manmay-nakhashi closed 2 months ago

manmay-nakhashi commented 2 months ago
manmay-nakhashi commented 2 months ago

to test or train just run by default it's using MushanW/GLOBE dataset 23,519 speakers and covers 164 accents

python3 train_e2.py

Note: need to add a path to vocab.json

lucidrains commented 2 months ago

@manmay-nakhashi Manmay! i remember you now from the natural speech work we did together some time ago

thanks for the PR! I will check it out tomorrow morning 😄

lucidrains commented 2 months ago

@manmay-nakhashi hey, looks good! 😄 do you want to try pulling and integrating the text as well?

manmay-nakhashi commented 2 months ago

Sure I'll do that.

manmay-nakhashi commented 2 months ago
Screenshot 2024-07-14 at 8 14 32 AM
manmay-nakhashi commented 2 months ago

@lucidrains it's ready

manmay-nakhashi commented 2 months ago

i'll write a inference script next so we can do some quick experiments.

lucidrains commented 2 months ago

nice! it looks good, but in the paper, they didn't use a tokenizer and just went character level

i was thinking we could just use utf character ids? (could remove the tokenizer and vocab.json altogether) keep it simple

manmay-nakhashi commented 2 months ago

@lucidrains changes are done

manmay-nakhashi commented 2 months ago

@lucidrains resolved all the suggestions

lucidrains commented 2 months ago

@manmay-nakhashi thank you Manmay!