ibm-cloud-docs / text-to-speech

text-to-speech
6 stars 26 forks source link

<Break> element bug/outdated? Does not output enclosed text. Different SSML version stated vs linked? #143

Closed henryjc closed 2 years ago

henryjc commented 2 years ago

I am following an example from the documentation page, linked and screenshot below regarding the \<break> element. It seems like the text enclosed in the \<break> tags are not outputted properly (at all). I suspect this may have something to do with differences between SSML version 1.0 and 1.1? Or is this a peculiarity with IBM's treatment of SSML?

I noticed here (https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-ssml) and elsewhere, that the page states that the TTS is based off of version 1.0, but then it links to a page for 1.1 (https://www.w3.org/TR/speech-synthesis/) - "The Text to Speech service bases its support on SSML Version 1.0, which was recommended by W3C on September 7, 2004. For more information about the W3C SSML recommendation, see W3C Speech Synthesis Markup Language (SSML) Version 1.0."

Code I used to test; the output doesn't synthesize the text in between the break tags, yet there are breaks of the right length:

with open('test3.wav', 'wb') as audio_file:
    audio_file.write(
        tts.synthesize('''<speak version="1.0">
                        Different sized <break strength="none">no pause</break>
                        Different sized <break strength="x-weak">x-weak pause</break>
                        Different sized <break strength="weak">weak pause</break>
                        Different sized <break strength="medium">medium pause</break>
                        Different sized <break strength="strong">strong pause</break>
                        Different sized <break strength="x-strong">x-strong pause</break>
                        Different sized <break time="1s">one-second pause</break>
                        Different sized <break time="1500ms">1500-millisecond pause</break>
                        </speak>
                        ''',
                       voice = 'en-US_KevinV3Voice',
                       accept = 'audio/wav'
                       ).get_result().content)

Removing the second \</break> results in a mismatched tag error, however other SSML documentation (amazon, W3C, google) shows that the \<break> element does not enclose anything, but rather can be used independently. Ex.

     <speak>
     Step 1, take a deep breath. <break time="200ms"/>
     Step 2, exhale.
     Step 3, take a deep breath again. <break strength="weak"/>
     Step 4, exhale.
     </speak>

Am I missing something here?

image

Link to problemed documentation page: https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-elements#break_element

jeffpk62 commented 2 years ago

@henryjc You are correct on both counts!

I will close this issue when both of the changes have been updated in the documentation. Thank you for taking the time to report the excellent comments, and I apologize for the errors!

jeffpk62 commented 2 years ago

@henryrc The documentation will be fully updated next week. I am unable to publish the fixes before then. Thank you again!

henryjc commented 2 years ago

@jeffpk62 Good to hear, no rush I was just curious. Cheers

jeffpk62 commented 2 years ago

This is published and live. Thank you!