edge-tts outputs speech word by word in Chinese

01098996 commented 9 months ago

When using edge-tts for Chinese text-to-speech conversion, I found that the output speech is pieced together word by word, rather than output naturally and continuously.

This word-by-word output method severely affects the fluency and naturalness of the speech, making it sound very unnatural. I don't know where the problem is.

Steps to reproduce:

subprocess.call([

"edge-tts",

"--voice", "zh-CN-XiaoxiaoNeural",

"--text", "文字再测一下",

"--write-media", "/tmp/output.wav",

])

Play the conversion result

It can be clearly heard that the speech is output word by word, rather than continuous natural speech.

Expected behavior:

The speech conversion result should be a continuous natural output of the whole sentence, rather than pieced together word by word.

Environment:

OS: Ubuntu 20.04

Edge-tts version: edge-tts 6.1.8

Test text: 文字再测一下

Please let me know if any other details are needed to reproduce this issue. Look forward to more natural and fluent output. Thanks!

Incorrect audio file: https://drive.google.com/file/d/1Wod4IWhD8oicEdHL8hkpPUCTT7hcG6ca/view?usp=drive_link

rany2 commented 9 months ago

Does it behave like this in Edge Browser as well?

01098996 commented 9 months ago

Does it behave like this in Edge Browser as well?

In Edge Browser, it synthesizes Chinese text to speech fluently and naturally, without any word-by-word choppiness.

01098996 commented 9 months ago

I'm sorry, this is my mistake. The code logic was inserting spaces between each character when passing the string.

rany2 / edge-tts

edge-tts outputs speech word by word in Chinese #142