Closed KYShek closed 3 weeks ago
Do you think a new module would be able to work in C++ as well as Python?
Of course. We all know phonemization is just some string processing. Let alone the fact that yanyiwu/cppjieba seems well-developed. Since yanyiwu/cppjieba needs no linking process, I think it will not take much time to combine it into the project. (Components of jieba are just a big dict, some dynamic programming methods and an HMM model.) Problem is to process character to phoneme with my word splits remaining. Well, in fact it will not take me much time to write a chinese character to phoneme module. Disregarding character with two or more readings, character to phoneme is just some string replacement.
And I have already finetuned lessac with dataset preprocessed by CjangCjengh/vits. Result looks better. test script: 彩虹,又稱天弓、天虹、絳等,簡稱虹,是氣象中的一種光學現象,當太陽 光照射到半空中的水滴,光線被折射及反射,在天空上形成拱形的七彩光譜,由外 圈至内圈呈紅、橙、黃、綠、蓝、靛蓝、堇紫七种颜色(霓虹則相反)。 wave file: https://drive.google.com/file/d/1vF2KxensiRQLdtNNOPlvPZvimVKpO3w5/view?usp=drive_link
From another perspective, I am also interested to have a mode where an externally added word segmentation is preserved in Piper. In my case, the Chinese sentences are generated from some formal representation, so it is trivial to generate with segmentation and without segmentation.
How to finetune?
I want to know how to finetune. Thank you.
@KYShek Hi, wondering if you ended up successfully use a new phonemization module. Thanks!
@KYShek 小哥,靡不有初鲜克有终? //Chinese idiom: almost everything has a start, but not many things have an end? thanks!
I'm working on applying some other word segmentation module like jieba to piper-phonemize, because the Chinese dict in espeak-ng is far too small and rigid. I want to know if espeak-ng can keep my splits(seems espeak-ng may remove all the space in chinese text before dicting Pinyin ). Or should I write a new character to phoneme module? (Maybe I should open this issue in rhasspy/piper-phonemize?)