netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Apache License 2.0
7.48k stars 636 forks source link

API接口如何处理多音字比如”还(hai2)不还(huan2)钱“ #143

Open Oceannew opened 8 months ago

Oceannew commented 8 months ago

API接口如何处理多音字比如”还(hai2)不还(huan2)钱“ 。两个还出来的是一样的发音。

{ "input": "还不还钱", "voice": "", "prompt": "", "language": "zh_us", "model": "emoti-voice", "response_format": "mp3", "speed": 1.0 }

导出的MP3:https://github.com/netease-youdao/EmotiVoice/assets/37178037/f9a19d84-9b63-4adf-9c62-e8663c8cb0a7

syq163 commented 8 months ago

It is a good question! Perhaps you could follow these steps:

  1. Generate phonetic transcriptions from the text '还不还钱' by using python frontend.py data/text. This will yield phonetic results like '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'.

  2. Adjust the phonetic results as needed, for example: '<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'.

  3. Perform TTS inference using python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts.

I have provided an example of my experiment for your reference.

issues_143.tar.gz

Oceannew commented 8 months ago

这是个好问题!也许您可以按照以下步骤操作:

  1. 使用 从文本 '还不还钱' 生成音标。这将产生语音结果,例如 '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'。python frontend.py data/text
  2. 根据需要调整拼音结果,例如:“<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'。
  3. 使用 执行 TTS 推理。python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts

我提供了一个我的实验示例供您参考。

issues_143.tar.gz

那我该如何去判断是h ai2还是h uan2呢,在input参数的文本上添加标记么。比如:“input”: “还(h ai2)不还(h uan2)钱”。然后去修改frontend.py中的方法去判断吗?