willwade / tts-wrapper

TTS-Wrapper makes it easier to use text-to-speech APIs by providing a unified and easy-to-use interface.
MIT License
8 stars 2 forks source link

speak_streamed on google issue for using mp3 #25

Open Chafid opened 1 week ago

Chafid commented 1 week ago

Prerequisites

For more information, see the contributing guide.

Description

Function speak_streamed can be configured to use mp3. However since google tts does not support mp3 yet, it will produce error

Steps to Reproduce

  1. call speak_streamed like this tts.speak_streamed("test google", str(output_file), audio_format='mp3')

Expected behavior: [What you expected to happen]

  1. playback will produce error, and no sound came out
  2. mp3 will be created

Actual behavior: [What actually happened] mp3 is not implemented for playback, only for saved to file

willwade commented 1 week ago

I need to test this

import sounddevice as sd
import numpy as np
from io import BytesIO
import threading

class GoogleTTS(AbstractTTS):
    # Your class setup and init methods remain unchanged

    def speak_streamed(
        self,
        text: str,
        save_to_file_path: Optional[str] = None,
        audio_format: Optional[str] = "wav",
    ) -> None:
        """
        Synthesize text and stream it for playback using sounddevice.
        Optionally save the audio to a file after playback completes.

        :param text: The text to synthesize and stream.
        :param save_to_file_path: Path to save the audio file (optional).
        :param audio_format: Audio format to save (e.g., 'wav', 'mp3', 'flac').
        """
        # Synthesize audio to bytes
        audio_bytes = self.synth_to_bytes(text)

        if audio_format == "mp3":
            # Decode MP3 to PCM
            pcm_data = self._convert_mp3_to_pcm(audio_bytes)
            self.audio_rate = 22050  # Or set based on your TTS output rate
            channels = 1  # Assuming mono output
        else:
            # Directly use PCM data for other formats
            pcm_data = audio_bytes
            channels = 1

        # Playback in a new thread for non-blocking audio
        threading.Thread(target=self._play_pcm_stream, args=(pcm_data, channels)).start()

        # Optionally save to file
        if save_to_file_path:
            with open(save_to_file_path, "wb") as f:
                f.write(audio_bytes)

    def _play_pcm_stream(self, pcm_data: bytes, channels: int):
        """Streams PCM data using sounddevice."""
        audio_data = np.frombuffer(pcm_data, dtype=np.int16).reshape(-1, channels)
        with sd.OutputStream(
            samplerate=self.audio_rate,
            channels=channels,
            dtype="int16",
        ) as stream:
            stream.write(audio_data)