rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.47k stars 474 forks source link

how to train japanese voice? #280

Open gjin10969 opened 11 months ago

gjin10969 commented 11 months ago

is there japanese training fine tune the model?

BornSaint commented 11 months ago

https://github.com/rhasspy/piper/blob/master/notebooks/piper_multilingual_training_notebook.ipynb there's no option to train other than from english checkpoint in this notebook, but you can adapt the code, also you will need to build ljspeech dataset for japanese and finetune maybe a chinese checkpoint, which is more similar to your language. https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/zh/zh_CN/huayan/medium finetune chinese model can be more fast than other languages checkpoints. portuguese models were finetuned from english checkpoints and are very very good. Also, i tried to finetune another portuguese model from english checkpoint, and it worked pretty well

kazukiotsuka commented 10 months ago

@BornSaint @synesthesiam I'm interested in extending this codebase to Japanese. Japanese phoneme dict will be used? If so, could you suggest a short explanation to work on it? And how many hours dataset should be used in minimum to adapt other language's pretrained model to Japanese to achieve natural result near to english model? We have a phoneme dict (g2p dict) and original dataset with more than ~10 hours by high quality recording. In any way, I'll read the code at first.

colafly commented 3 months ago

@BornSaint picked back up the this thread. I have tried to fine-tune a Chinese model using chinese voice but the phonetics is off. I have provide ~500 wav files. I assume that's more related to the issue with piper-phonetics instead of the core piper lib? there are other libraries that provide better chinese phonetics, is there a way i can use those instead of espeak-ng / piper-phonetics ?

Enchante503 commented 2 months ago

There is a misconception that Chinese is similar just because it is Asian. The language and pronunciation are completely different. In terms of pronunciation and characteristics, Spanish, Portuguese, and Turkish are said to be closer to Japanese.