jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
667 stars 150 forks source link

About speaker embedding #10

Closed Charlottecuc closed 4 years ago

Charlottecuc commented 4 years ago

Hi. Could you please give me some advice on adding speaker embedding (as you mentioned in the paper) to your code? Thanks!

Charlottecuc commented 4 years ago

I mean, how to use the TextMelSpeakerLoader? Thanks

Charlottecuc commented 4 years ago

Ah, I see your reply in https://github.com/jaywalnut310/glow-tts/issues/4. Thank you very much for your work :) Besides, could you please tell me how many epochs do we need for the demo model (the LJ dataset) to converge? Thanks!

jaywalnut310 commented 4 years ago

Thanks for your interest! The number of training steps of the single speaker model was 240K.

Because I trained the model on two GPUs, if you have only one GPU, you should increase the batch size twice in the config file.