Open Fu-u0718 opened 7 months ago
@Fu-u0718 says: I'm interested in having conversations not only in Japanese using an avatar, but also in English. However, I found out that the VOICEVOX software used in the code does not support English. Have you created any programs that utilize Text-to-Speech services like Google or Azure for this purpose?
Hi @Fu-u0718 , You can make custom SpeechController that is based on TTS services you like.
aiavatar.speech.SpeechController
Here is an example for Azure:
import aiohttp
import asyncio
import io
from logging import getLogger, NullHandler
import traceback
import wave
import numpy
import sounddevice
from . import SpeechController
class VoiceClip:
def __init__(self, text: str):
self.text = text
self.download_task = None
self.audio_clip = None
class AzureSpeechController(SpeechController):
def __init__(self, api_key: str, region: str, speaker_name: str="ja-JP-AoiNeural", speaker_gender: str="Female", lang="ja-JP", device_index: int=-1, playback_margin: float=0.1):
self.logger = getLogger(__name__)
self.logger.addHandler(NullHandler())
self.api_key = api_key
self.region = region
self.speaker_name = speaker_name
self.speaker_gender = speaker_gender
self.lang = lang
self.device_index = device_index
self.playback_margin = playback_margin
self.voice_clips = {}
self._is_speaking = False
async def download(self, voice: VoiceClip):
url = f"https://{self.region}.tts.speech.microsoft.com/cognitiveservices/v1"
headers = {
"X-Microsoft-OutputFormat": "riff-16khz-16bit-mono-pcm",
"Content-Type": "application/ssml+xml",
"Ocp-Apim-Subscription-Key": self.api_key
}
ssml_text = f"<speak version='1.0' xml:lang='{self.lang}'><voice xml:lang='{self.lang}' xml:gender='{self.speaker_gender}' name='{self.speaker_name}'>{voice.text}</voice></speak>"
data = ssml_text.encode("utf-8")
async with aiohttp.ClientSession() as session:
async with session.post(url, headers=headers, data=data) as response:
if response.status == 200:
voice.audio_clip = await response.read()
def prefetch(self, text: str):
v = self.voice_clips.get(text)
if v:
return v
v = VoiceClip(text)
v.download_task = asyncio.create_task(self.download(v))
self.voice_clips[text] = v
return v
async def speak(self, text: str):
voice = self.prefetch(text)
if not voice.audio_clip:
await voice.download_task
with wave.open(io.BytesIO(voice.audio_clip), "rb") as f:
try:
self._is_speaking = True
data = numpy.frombuffer(
f.readframes(f.getnframes()),
dtype=numpy.int16
)
framerate = f.getframerate()
sounddevice.play(data, framerate, device=self.device_index, blocking=False)
await asyncio.sleep(len(data) / framerate + self.playback_margin)
except Exception as ex:
self.logger.error(f"Error at speaking: {str(ex)}\n{traceback.format_exc()}")
finally:
self._is_speaking = False
def is_speaking(self) -> bool:
return self._is_speaking
app.avatar_controller.speech_controller = AzureSpeechController(
AZURE_SUBSCRIPTION_KEY, AZURE_REGION,
speaker_name="en-US-AvaNeural",
speaker_gender="Female",
lang="en-US",
device_index=2 # Set output device number on you PC
)
However, I've found that AIAvatar has an issue handling English responses from ChatGPT. I will fix it soon.
I've fixed it👍 https://github.com/uezo/aiavatarkit/pull/32
thank you! You will learn a lot. I would also like to enjoy conversation in English. Thank you for taking the time out of your busy schedule to respond!
Hi I tried with openai speech service, however it got stucked on [INFO] 2024-07-15 17:28:44,009 : Listening... (OpenAIWakewordListener)
Hi @mosu7, Thank you for your post but we are discussing about Text-to-Speech in this issue, not wake word listener. Make another issue if you want discuss about it.
アバターと日本語だけではなく、英語での会話も行ってみたいと考えているのですが、コードで使用しているVOICEVOXは英語が話せないと知りました。例えば、GoogleやAzureのText-to-Speechを使用するなどして組んだプログラムはお作りになっていませんか?