How to synthesize English ( and if possible, Chinese)

reveperdu commented 1 year ago

I downloaded the model from huggingface, and loaded it by calling espnet2.bin.tts_inference. It can synthesize fluent Japanese, but it seem to be able to read only English letters, not words or sentences eg. "hello world" I know this is possibly a problem with the tokenizer or encoders, but I don't know how to adjust or replace them.

mio7690 commented 1 year ago

‘I know this is possibly a problem with the tokenizer or encoders, but I don't know how to adjust or replace them’ --> as vits is an end-to-end text2speech model, this is not a problem with the tokenizer or encoders.

Actually I just finetune the model on the '_jsut_' based model with 32 sentence in steins gate games.

To synthesize English or Chinese, this is not easy problem. Because you should train a based model with English and Chinese data. Then finetune it again.

But I'm busy and can not spend too much time on it. If you are urged to do this, I'm pleased to help you.

reveperdu commented 1 year ago

Thanks for reply! I will try to figure it out

mio7690 / Amadeus

How to synthesize English ( and if possible, Chinese) #1