How to directly play audio bytes?

tn-17 commented 8 months ago

edge_tts has a stream mode where audio data is returned in byte chunks. I have read the other issues discussing the use of a custom player.register_stream_protocol to play audio from a file-like object (https://github.com/jaseg/python-mpv/issues/199). However, I don't quite understand how to implement it.

This is an example of using the python_stream_catchall decorator to read bytes from the file that is being written to. I would like to directly pass chunk["data"] (the audio bytes if that is the correct term), to the generator function, read it, and yield the result instead of having the reader function open a read stream to read and yield the bytes.

import os
import edge_tts
import tempfile

# include scripts folder in path so mpv can be imported
os.environ["PATH"] = (
    os.path.join(os.path.dirname(os.path.abspath(__file__)), "scripts")
    + os.pathsep
    + os.environ["PATH"]
)

from scripts import mpv

player = mpv.MPV()

@player.python_stream_catchall
def catchall(file_name):
    def reader():
        with open(file_name, "rb") as f:
            while True:
                yield f.read(1024 * 1024)

    return reader, None

async def amain() -> None:
    communicate = edge_tts.Communicate(TEXT, VOICE)
    mp3_fname = ""

    try:
        with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp_file:
            mp3_fname = tmp_file.name
            async for chunk in communicate.stream():
                if chunk["type"] == "audio":
                    tmp_file.write(chunk["data"])
                    player.play(f"python://{mp3_fname}")

        player.wait_for_playback()
        player.terminate()
    finally:
        os.unlink(mp3_fname)

jaseg commented 8 months ago

Hey there,

here's an example of how you can make this work. At the core of the issue is that because python stream callbacks are called from inside libmpv, AFAICT they can never be async. To bridge from the asyncio receive code to the blocking stream callback, we use a (non-async!) queue from the queue module. In general, the Queue.put call is blocking, but here will never block since our queue has unlimited size.

This code streams the audio as intended. When you ask edge-tts to speak some long text and it returns multiple chunks, this code will start playing the first chunk while the second chunk is still being received.

One small change I did to the way your code used python-mpv is that I used a regular python stream instead of the catch-all. This makes the code a little bit simpler.

import os
import queue
import asyncio

import edge_tts
import mpv

player = mpv.MPV()
q = queue.Queue()

@player.python_stream('edge-tts')
def reader():
    while (block := q.get()):
        yield block

async def speak(text, voice='en-US-AvaNeural'):
    async for chunk in edge_tts.Communicate(text, voice).stream():
        if chunk["type"] == "audio":
            q.put(chunk["data"])

    player.play(f"python://edge-tts")

async def amain() -> None:
    await speak('Test')
    await asyncio.sleep(3)
    await speak('Foobar')
    player.wait_for_playback()
    player.terminate()

jaseg commented 8 months ago

An addition to my answer above:

To make this and similar use cases easier, I added two new convenience functions to mpv.py: MPV.play_bytes(some_bytes) and MPV.play_context(). Both are on the master branch right now, and will land in release v1.1.0 in the coming days.

Rewritten with MPV.play_context(), the code from my answer above can be simplified to:

mport os
import edge_tts
import asyncio

import mpv
player = mpv.MPV()

async def speak(text, voice='en-US-AvaNeural'):
    with player.play_context() as write:
        async for chunk in edge_tts.Communicate(text, voice).stream():
            if chunk["type"] == "audio":
                write(chunk["data"])

async def amain() -> None:
    await speak('Test')
    await asyncio.sleep(3)
    await speak('Foobar')
    player.wait_for_playback()
    player.terminate()

jaseg / python-mpv

How to directly play audio bytes? #269