API接口如何处理多音字比如”还（hai2）不还（huan2）钱“

Oceannew commented 8 months ago

API接口如何处理多音字比如”还（hai2）不还（huan2）钱“ 。两个还出来的是一样的发音。

{ "input": "还不还钱", "voice": "", "prompt": "", "language": "zh_us", "model": "emoti-voice", "response_format": "mp3", "speed": 1.0 }

导出的MP3：https://github.com/netease-youdao/EmotiVoice/assets/37178037/f9a19d84-9b63-4adf-9c62-e8663c8cb0a7

syq163 commented 8 months ago

It is a good question! Perhaps you could follow these steps:

Generate phonetic transcriptions from the text '还不还钱' by using python frontend.py data/text. This will yield phonetic results like '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'.
Adjust the phonetic results as needed, for example: '<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'.
Perform TTS inference using python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts.

I have provided an example of my experiment for your reference.

issues_143.tar.gz

Oceannew commented 8 months ago

这是个好问题！也许您可以按照以下步骤操作：

使用从文本 '还不还钱' 生成音标。这将产生语音结果，例如 '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'。python frontend.py data/text

根据需要调整拼音结果，例如：“<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'。

使用执行 TTS 推理。python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts

我提供了一个我的实验示例供您参考。

issues_143.tar.gz

那我该如何去判断是h ai2还是h uan2呢，在input参数的文本上添加标记么。比如：“input”： “还(h ai2)不还(h uan2)钱”。然后去修改frontend.py中的方法去判断吗？

netease-youdao / EmotiVoice

API接口如何处理多音字比如”还（hai2）不还（huan2）钱“ #143