nateshmbhat / pyttsx3

Offline Text To Speech synthesis for python
Mozilla Public License 2.0
2.15k stars 336 forks source link

SSML rendering not working #287

Open colin-hue opened 1 year ago

colin-hue commented 1 year ago

The engine.say method does not render SSML in espeak;; and there is no method to set the equivalent of the CLI espeak "-m" option to tell pyttsx3 that the content of the string is SSML and have it render properly. There is an argument that SSML rendering should be turned on by default - to do this the say function in espeak.py should be altered to OR _espeak.SSML with the other flags currently used.

willwade commented 1 month ago

Note to self.


def say(self, text, ssml=False):
    self._text_to_say = text
    self._ssml = ssml

Then in start synthesis we need to


flags = _espeak.ENDPAUSE | _espeak.CHARS_UTF8
if self._ssml:
    flags |= _espeak.SSML

But needs thought. We should have a flag that works in sapi and nsss too

And for ssml in sapi. Remember we need to work on other methods like save to file.

def say(self, text, ssml=False):
    self._proxy.setBusy(True)
    self._proxy.notify("started-utterance")
    self._speaking = True
    self._current_text = text
    # Use SVSFIsXML flag if SSML is enabled
    flags = SpeechLib.SVSFlagsAsync | (SpeechLib.SVSFIsXML if ssml else 0)
    self._tts.Speak(fromUtf8(toUtf8(text)), flags)

def save_to_file(self, text, filename, ssml=False):
    cwd = os.getcwd()
    stream = comtypes.client.CreateObject("SAPI.SpFileStream")
    stream.Open(filename, SpeechLib.SSFMCreateForWrite)
    temp_stream = self._tts.AudioOutputStream
    self._current_text = text
    self._tts.AudioOutputStream = stream
    # Use SVSFIsXML flag if SSML is enabled
    flags = SpeechLib.SVSFIsXML if ssml else 0
    self._tts.Speak(fromUtf8(toUtf8(text)), flags)
    self._tts.AudioOutputStream = temp_stream
    stream.close()
    os.chdir(cwd)

Small snag. I don't think ssml is supported in nsss. Annoying. I could be wrong.


import re

# Utility function to remove SSML tags
def strip_ssml(text):
    # This removes basic SSML tags. Expand as needed for specific tags
    return re.sub(r"<[^>]*>", "", text)

class NSSpeechDriver(NSObject):
    # Existing methods...

    @objc.python_method
    def say(self, text, ssml=False):
        self._proxy.setBusy(True)
        self._completed = True
        self._proxy.notify("started-utterance")
        self._current_text = strip_ssml(text) if ssml else text  # Process SSML if needed
        self._tts.startSpeakingString_(self._current_text)

    @objc.python_method
    def save_to_file(self, text, filename, ssml=False):
        self._proxy.setBusy(True)
        self._completed = True
        self._current_text = strip_ssml(text) if ssml else text  # Process SSML if needed
        url = Foundation.NSURL.fileURLWithPath_(filename)
        self._tts.startSpeakingString_toURL_(self._current_text, url)

However I think we should migrate all of macOS code to AVSpeechSynth

https://gist.github.com/willwade/93e709147d7ce9a5f80a3c3944f8e331