Open Fu-u0718 opened 7 months ago
@Fu-u0718 says: I'm interested in having conversations not only in Japanese using an avatar, but also in English. However, I found out that the VOICEVOX software used in the code does not support English. Have you created any programs that utilize Text-to-Speech services like Google or Azure for this purpose?
Hi @Fu-u0718 , You can make custom SpeechController that is based on TTS services you like.
Here is an example for Azure:
import aiohttp
import asyncio
import io
from logging import getLogger, NullHandler
import traceback
import wave
import numpy
import sounddevice
from . import SpeechController
class VoiceClip:
def __init__(self, text: str):
self.text = text
self.download_task = None
self.audio_clip = None
class AzureSpeechController(SpeechController):
def __init__(self, api_key: str, region: str, speaker_name: str="ja-JP-AoiNeural", speaker_gender: str="Female", lang="ja-JP", device_index: int=-1, playback_margin: float=0.1):
self.logger = getLogger(__name__)
self.api_key = api_key
self.region = region
self.speaker_name = speaker_name
self.speaker_gender = speaker_gender
self.lang = lang
self.device_index = device_index
self.playback_margin = playback_margin
self.voice_clips = {}
self._is_speaking = False
async def download(self, voice: VoiceClip):
url = f"https://{self.region}"
headers = {
"X-Microsoft-OutputFormat": "riff-16khz-16bit-mono-pcm",
"Content-Type": "application/ssml+xml",
"Ocp-Apim-Subscription-Key": self.api_key
ssml_text = f"<speak version='1.0' xml:lang='{self.lang}'><voice xml:lang='{self.lang}' xml:gender='{self.speaker_gender}' name='{self.speaker_name}'>{voice.text}</voice></speak>"
data = ssml_text.encode("utf-8")
async with aiohttp.ClientSession() as session:
async with, headers=headers, data=data) as response:
if response.status == 200:
voice.audio_clip = await
def prefetch(self, text: str):
v = self.voice_clips.get(text)
if v:
return v
v = VoiceClip(text)
v.download_task = asyncio.create_task(
self.voice_clips[text] = v
return v
async def speak(self, text: str):
voice = self.prefetch(text)
if not voice.audio_clip:
await voice.download_task
with, "rb") as f:
self._is_speaking = True
data = numpy.frombuffer(
framerate = f.getframerate(), framerate, device=self.device_index, blocking=False)
await asyncio.sleep(len(data) / framerate + self.playback_margin)
except Exception as ex:
self.logger.error(f"Error at speaking: {str(ex)}\n{traceback.format_exc()}")
self._is_speaking = False
def is_speaking(self) -> bool:
return self._is_speaking
app.avatar_controller.speech_controller = AzureSpeechController(
device_index=2 # Set output device number on you PC
However, I've found that AIAvatar has an issue handling English responses from ChatGPT. I will fix it soon.
I've fixed it👍
thank you! You will learn a lot. I would also like to enjoy conversation in English. Thank you for taking the time out of your busy schedule to respond!
Hi I tried with openai speech service, however it got stucked on [INFO] 2024-07-15 17:28:44,009 : Listening... (OpenAIWakewordListener)
Hi @mosu7, Thank you for your post but we are discussing about Text-to-Speech in this issue, not wake word listener. Make another issue if you want discuss about it.