netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Apache License 2.0
6.63k stars 556 forks source link

训练模型,怎么把已有的英语数据转换为易魔声需要的格式 #102

Closed songyinghao closed 6 months ago

songyinghao commented 6 months ago

prosody数据如下: 100028_S0000 yes#1 they#1 are#3 i#1 like#3 them#4 Y1 EH1 S1 / DH1 / EY1 ER1 / AY1 / L1 AY1 K1 / DH1 AH1 M1

EmotiVoice需要的格式如下 "text":["<sos/eos>"] + phones+ ["<sos/eos>"],

请问该怎么拼接phones呢

syq163 commented 6 months ago

Are you asking about the data format for training or inference purposes?

songyinghao commented 6 months ago

@syq163 对不起,我描述的不清楚。 1,场景: 根据英文语料,训练TTS

2,训练数据情况(已经有prosody(phones),无需自动生成) image image

3,EmotiVoice提供的训练的示例代码 image

问题: 请问上面的第三点,请问怎么把训练数据中的phones转成EmotiVoice需要的格式呢

训练数据: 100010_S0000 it#3's#1 cheap#4 IH1 T1 S1 / S1 IY1 EY1 CH1 IY1 AH1 P1 IY1

songyinghao commented 6 months ago

已解决,使用EmotiVoice自带的FrontEnd生成数据