Closed tn-17 closed 8 months ago
Hey there,
here's an example of how you can make this work. At the core of the issue is that because python stream callbacks are called from inside libmpv, AFAICT they can never be async. To bridge from the asyncio receive code to the blocking stream callback, we use a (non-async!) queue from the queue module. In general, the Queue.put call is blocking, but here will never block since our queue has unlimited size.
This code streams the audio as intended. When you ask edge-tts to speak some long text and it returns multiple chunks, this code will start playing the first chunk while the second chunk is still being received.
One small change I did to the way your code used python-mpv is that I used a regular python stream instead of the catch-all. This makes the code a little bit simpler.
import os
import queue
import asyncio
import edge_tts
import mpv
player = mpv.MPV()
q = queue.Queue()
@player.python_stream('edge-tts')
def reader():
while (block := q.get()):
yield block
async def speak(text, voice='en-US-AvaNeural'):
async for chunk in edge_tts.Communicate(text, voice).stream():
if chunk["type"] == "audio":
q.put(chunk["data"])
player.play(f"python://edge-tts")
async def amain() -> None:
await speak('Test')
await asyncio.sleep(3)
await speak('Foobar')
player.wait_for_playback()
player.terminate()
An addition to my answer above:
To make this and similar use cases easier, I added two new convenience functions to mpv.py: MPV.play_bytes(some_bytes)
and MPV.play_context()
. Both are on the master branch right now, and will land in release v1.1.0 in the coming days.
Rewritten with MPV.play_context()
, the code from my answer above can be simplified to:
mport os
import edge_tts
import asyncio
import mpv
player = mpv.MPV()
async def speak(text, voice='en-US-AvaNeural'):
with player.play_context() as write:
async for chunk in edge_tts.Communicate(text, voice).stream():
if chunk["type"] == "audio":
write(chunk["data"])
async def amain() -> None:
await speak('Test')
await asyncio.sleep(3)
await speak('Foobar')
player.wait_for_playback()
player.terminate()
edge_tts has a stream mode where audio data is returned in byte chunks. I have read the other issues discussing the use of a custom
player.register_stream_protocol
to play audio from a file-like object (https://github.com/jaseg/python-mpv/issues/199). However, I don't quite understand how to implement it.This is an example of using the
python_stream_catchall
decorator to read bytes from the file that is being written to. I would like to directly passchunk["data"]
(the audio bytes if that is the correct term), to the generator function, read it, and yield the result instead of having the reader function open a read stream to read and yield the bytes.