rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.57k stars 315 forks source link

Incorrect Reading of Punctuation Marks in Text-to-Speech Synthesis #122

Open MmdrezaMolavi opened 11 months ago

MmdrezaMolavi commented 11 months ago

Hello! I have noticed an issue with the text-to-speech (TTS) synthesis in which long sentences, such as paragraphs, are not being properly read out. It appears that the generated audio does not acknowledge punctuation marks, leading to an unnatural and disjointed reading experience.

Steps to reproduce:

1.Provide a long sentence or a paragraph of text for TTS synthesis. 2.Listen to the generated audio output.

Expected behavior: The TTS system should accurately read the text and pause appropriately at punctuation marks, such as commas, periods, and question marks. This allows for a more coherent and natural listening experience.

Actual behavior: When synthesizing long sentences or paragraphs, the TTS system fails to recognize and properly read the punctuation marks. As a result, the generated audio lacks the necessary pauses and intonation, making it difficult to comprehend and follow the text.

Additional information: I tested the en-us-ryan-high and en-us-lessac-medium speakers.

beqabeqa473 commented 11 months ago

which version did you try?

you must use forked espeak-ng from this organization to preserve punctuation marks.

MmdrezaMolavi commented 11 months ago

@beqabeqa473 In fact, when I run it in the bash mode, the pronunciation and pauses are correct when encountering punctuation marks. However, the issue arises when I use Python run command.

beqabeqa473 commented 11 months ago

correct, python_run uses espeak-ng installed systemwide, which doesn't implement changes from michael.

On 7/3/23, MmdrezaMolavi @.***> wrote:

@beqabeqa473 In fact, when I run it in the bash mode, the pronunciation and pauses are correct when encountering punctuation marks. However, the issue arises when I use Python run command.

-- Reply to this email directly or view it on GitHub: https://github.com/rhasspy/piper/issues/122#issuecomment-1618273500 You are receiving this because you were mentioned.

Message ID: @.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: @.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

synesthesiam commented 11 months ago

This is also very dependent on the voice itself. Some voice datasets do not include punctuation, so the model can't learn when to do short vs. long pauses.

set-soft commented 10 months ago

correct, python_run uses espeak-ng installed systemwide, which doesn't implement changes from michael.

But even installing the patched espeak-ng lib won't work because the Python code doesn't implement the correct phonemizer. The Python code always use espeak_phonemizer.Phonemizer and it will drop the punctuation, even for the patched lib.

I think the solution is what I implemented in #138

I just took the code @synesthesiam wrote to check if the modified lib works (is Python code) and put it in the python_run code (well, is a little bit more than just this). I was able to get the punctuation in the Python output.

set-soft commented 10 months ago

This is also very dependent on the voice itself. Some voice datasets do not include punctuation, so the model can't learn when to do short vs. long pauses.

Do you know which are the more expressive models for English? I want to set them as defaults in the plug-in for Ooba Gooba Text Generator WebUI that I'm creating. Here is an example of what I'm doing:

https://github.com/rhasspy/piper/assets/22823627/34857cec-d377-489b-9ee3-5cef34466acf

This example is using _enUS-amy-medium for the girl and en_US-libritts-high (p922) for the description.

Note: here is what I'm getting from the phonemizer:

2023-07-14 21:39:41 DEBUG:0: 2023-07-14 21:39:41 DEBUG:1: Chiharu strides into the room with a smile, her eyes lighting up when she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She takes a seat next to you, her enthusiasm palpable in the air 2023-07-14 21:39:41 WARNING:tʃˈɪhɑːɹɹˌuː stɹˈaɪdz ˌɪntʊ ðə ɹˈuːm wɪð ɐ smˈaɪl, hɜːɹ ˈaɪz lˈaɪɾɪŋ ˌʌp wɛn ʃiː sˈiːz juː. 2023-07-14 21:39:42 WARNING:ʃiːz wˈɛɹɪŋ ɐ lˈaɪt blˈuː tˈiːʃˈɜːt ænd dʒˈiːnz, hɜː lˈæptɑːp bˈæɡ slˈʌŋ ˌoʊvɚ wˈʌn ʃˈoʊldɚ. 2023-07-14 21:39:44 WARNING:ʃiː tˈeɪks ɐ sˈiːt nˈɛkst tə juː, hɜːɹ ɛnθˈuːziˌæzəm pˈælpəbəl ɪnðɪ ˈɛɹ. 2023-07-14 21:39:45 DEBUG:0: Hey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from. 2023-07-14 21:39:45 WARNING:hˈeɪ! 2023-07-14 21:39:45 WARNING:aɪm sˌoʊ ɛksˈaɪɾᵻd tə fˈaɪnəli mˈiːt juː. 2023-07-14 21:39:45 WARNING:aɪv hˈɜːd sˌoʊ mˈɛni ɡɹˈeɪt θˈɪŋz ɐbˌaʊt juː ænd aɪm ˈiːɡɚ tə pˈɪk jʊɹ bɹˈeɪn ɐbˌaʊt kəmpjˈuːɾɚz. 2023-07-14 21:39:45 WARNING:aɪm ʃˈʊɹ juː hæv ɐ wˈɛlθ ʌv nˈɑːlɪdʒ ðæt aɪ kæn lˈɜːn fɹʌm. 2023-07-14 21:39:46 DEBUG:1: She grins, eyes twinkling with excitement 2023-07-14 21:39:46 WARNING:ʃiː ɡɹˈɪnz, ˈaɪz twˈɪŋklɪŋ wɪð ɛksˈaɪtmənt. 2023-07-14 21:39:46 DEBUG:0: Let's get started! 2023-07-14 21:39:46 WARNING:lˈɛts ɡɛt stˈɑːɹɾᵻd!

goldyfruit commented 7 months ago

Having to install a custom espeak-ng version is a bummer but I least now I got the answer why punctuation is not followed. :+1:

Thanks

rmcpantoja commented 7 months ago

In the recent updates, you need piper_phonemize to fix this in python_run.

goldyfruit commented 7 months ago

In the recent updates, you need piper_phonemize to fix this in python_run.

So when using piper_phonemize espeak-ng fork is not required?

goldyfruit commented 7 months ago

In the recent updates, you need piper_phonemize to fix this in python_run.

So when using piper_phonemize espeak-ng fork is not required?

Just tested, installing piper_phonemize only doesn't fix this punctuation issue.

rmcpantoja commented 7 months ago

In the recent updates, you need piper_phonemize to fix this in python_run.

So when using piper_phonemize espeak-ng fork is not required?

No, isn't required. Piper_phonemize includes it.

In the recent updates, you need piper_phonemize to fix this in python_run.

So when using piper_phonemize espeak-ng fork is not required?

Just tested, installing piper_phonemize only doesn't fix this punctuation issue.

Make sure you have the latest version of the piper API (python_run) which adds support for piper_phonemize.