neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
13.15k stars 1.82k forks source link

Is cross-language training supported? For example, training a model with Japanese speech and make it speak English #442

Open DogeLord081 opened 1 year ago

Keith-Hon commented 1 year ago

Totally possible, I have finetuned a model for Cantonese, a tonal language, like Mandarin. I also tried to mix it with english. Here is the result.

https://soundcloud.com/keith-hon/cantonese-english-mixed-tts-with-voice-clone

https://github.com/neonbjb/tortoise-tts/issues/493

DogeLord081 commented 1 year ago

Totally possible, I have finetuned a model for Cantonese, a tonal language, like Mandarin. I also tried to mix it with english. Here is the result.

https://soundcloud.com/keith-hon/cantonese-english-mixed-tts-with-voice-clone

https://github.com/neonbjb/tortoise-tts/issues/493

How? Can you tell me what model you used and do you mind sharing the training script? Thanks

Keith-Hon commented 1 year ago

Totally possible, I have finetuned a model for Cantonese, a tonal language, like Mandarin. I also tried to mix it with english. Here is the result. https://soundcloud.com/keith-hon/cantonese-english-mixed-tts-with-voice-clone

493

How? Can you tell me what model you used and do you mind sharing the training script? Thanks

I used DLAS and changed the vocab before finetune

superxii commented 1 year ago

@Keith-Hon

Hi, I am planning to finetune a model for Cantonese also. Would you mine to share some tips on the finetuning? Like the training samples sizes and the way to evaluate?

Keith-Hon commented 1 year ago

@Keith-Hon

Hi, I am planning to finetune a model for Cantonese also. Would you mine to share some tips on the finetuning? Like the training samples sizes and the way to evaluate?

i dun have the way to evaluate except my ear at the moment. sample sizes depend on your expected accuracy.

Keith-Hon commented 1 year ago

@superxii do you fine tune it for a company? Or a side hustle?

superxii commented 1 year ago

This is for my personal interest only, inspired by 允光 ai. But I will use company GPU to train.

1879687161 commented 1 year ago

@Keith-Hon Hi, do you mind making a repo or instruction about how you finetune your model? I am finetuning mine with Chinese, but it was terrible. If you could provide any ideas, that will be really helpful. Thank you!

Keith-Hon commented 1 year ago

@Keith-Hon Hi, do you mind making a repo or instruction about how you finetune your model? I am finetuning mine with Chinese, but it was terrible. If you could provide any ideas, that will be really helpful. Thank you!

https://github.com/neonbjb/tortoise-tts/discussions/129#discussioncomment-3157890

James talked about that earlier

Keith-Hon commented 1 year ago

This is for my personal interest only, inspired by 允光 ai. But I will use company GPU to train.

you might check out mrq repo, people shared some insights there as well

https://git.ecker.tech/mrq/ai-voice-cloning/issues/152

superxii commented 1 year ago

Really appreciate of your sharing and your demo. Keep it up!!!!!