maum-ai / assem-vc

Official Code for Assem-VC @ICASSP2022
https://mindslab-ai.github.io/assem-vc/
BSD 3-Clause "New" or "Revised" License
265 stars 39 forks source link

Build custom non-English dataset with ARPABET #34

Open Opdoop opened 2 years ago

Opdoop commented 2 years ago

Hi, thanks for opening this project. I'm a newbie in VC and I try to add a new speaker to assem-vc. In Prepare Metadata section, @wookladin uses python datasets/g2p.py to convert transcription into ARPABET. For custom dataset other than English, e.g. Mandarin Chinese, how to build metadata? I searched for g2p and find https://github.com/kakaobrain/g2pM, a Grapheme-to-Phoneme Conversion tool for Chinese. But the generated results are PinYin, not ARPABET format. This really confuses me. Could we use PinYin for Chinese to build metadata?

Opdoop commented 2 years ago

I find a phonetic notation called International Phonetic Alphabet(IPA) which support 100+ language for grapheme-to-phoneme. Maybe we can use IPA as the phoneme set for multilingual VC? I'll try it and see the performance.

Opdoop commented 2 years ago

No. I'll not dive into this building way. I just try any-to-many example with provided checkpoints on my custom English source wav. The female voice of high-frequency part is omitted by the model. The naturalness of the audio is worse than other baseline examples on the demo page. The results are disappointing. And I expect that the cross-lingual VC could be even worse. I don't get why changing timbre is that difficult. 😢