How to use render context API with NumPy/OpenCV?

SuperSonicHub1 commented 2 years ago

Hello!

I'm interested in using WebRTC, WebSockets and MPV to stream arbitrary video to a web browser. I'm currently working on the WebRTC to MPV part of that equation.

I'm going to be using aiortc for this process, as it has a lot of powerful abstractions for streaming audio and video. They interface with the low-level Python-FFmpeg wrapper PyAV, which itself can interface with NumPy's n-dimensional arrays. What I effectively want to do is get a big array of RGB values to render, which I can then pass to aiortc.

Currently, it seems that mpv supports only two renders: OpenGL and software. Would the software renderer be the way to go? If so, how can I use it to get the pixels I need? Is there a way I can interface with OpenGL over NumPy somehow?

SuperSonicHub1 commented 2 years ago

My current solution is pretty jank, but doesn't seem to be all that laggy AFIK:

class MPVStreamTrack(VideoStreamTrack):
    """
    A video track that returns screenshots of an MPV instance.
    """

    def __init__(self):
        super().__init__()  # don't forget this!

        # Init player
        self.player = MPV(ytdl=True)
        self.player.play("https://youtu.be/W43aQxzjyeM?t=37")

    async def recv(self):
        pts, time_base = await self.next_timestamp()

        frame = VideoFrame.from_image(self.player.screenshot_raw())
        frame.pts = pts
        frame.time_base = time_base
        return frame

Once I start streaming this to my browser and I realize I'm unhappy with the results, I intend to use MpvRenderContext with a "dummy" GL surface (nothing's ever drawn to my monitor's screen) with glReadPixels to hopefully get better performance. I'll still leave this open in case anyone has a better solution or I actually need to do that OpenGL scaffolding.

neinseg commented 2 years ago

Hey there,

I think I understand what you're trying to do, but I don't understand why. If it's just academic curiosity, that's fine, if you're actually trying to achieve something, I think you should really look at another way of doing things. Consider that no matter which rendering backend you choose, in the end mpv will decode your input video into a screen buffer. Your WebRTC lib then takes that pixel screen buffer and re-encodes it into another compressed video stream, which you then send to the browser. Why not send the original video file instead? If that file is too large, you should look at existing software. I am sure there already is a piece of software that does what you need using ffmpeg directly, without the whole mpv and python stuff around. ffmpeg by itself is pretty fast at scaling and re-encoding video.

As for the actual rendering API: Using OpenGL may not be the highest-performance option here depending on your webrtc lib. If you ask MPV to render each frame into an opengl buffer, but your webrtc lib expects a regular in-memory (RAM) buffer, you will have to manually copy each and every frame from GPU memory back into main RAM. The software renderer renders straight into RAM, and so that might in fact be the better option even if hardware acceleration is available. However, do not expect "good" performance by any means. Calls between libmpv and python and back are kind of expensive, and the software render interface (screenshot_raw) is really meant as a nice toy for experiments, but it will likely never be a good idea to actually try and actually push hundreds of megabytes per second of data through it.

If wanted to speed up screenshot_raw, the first thing would be to eliminate the splitting and re-merging of the image buffer. If you have a look at the code, you can see libmpv returns the screenshot in one pixel format, which screenshot_raw then converts into another pixel format. I don't know anything about the internals of PIL/pillow, but I assume that this code causes the entire screen buffer to be copied at least once. Ideally, you'd want to hand over the raw pointer from libmpv directly to your webrtc lib without copying anything, and make sure that both use the same pixel data format in the first place.

SuperSonicHub1 commented 2 years ago

if you're actually trying to achieve something, I think you should really look at another way of doing things

I probably will do that soon. The whole point of this was for me to be able to watch anime (but generally anything on youtube-dl) from my smart TV's web browser without having to worry about setting up an HLS proxy (see CORS) or fumbling around with rendering SSA subtitles in-browser, since mpv would handle all of that for me. All I needed to do was control mpv over a Websocket (pretty easy) and route mpv's video (a bit challenging) and audio (oh dear) over WebRTC. I was able to do all three of these things in isolation, but my stack's begun to violently collapse in on itself as I try to bring it all together due to all of the jank (I'm using named pipes to get audio from mpv's PCM audio driver :)). It's obvious to me that making a youtube-dl web app that allows you to consume more than simply static video/audio files is a large burden full of hoop-jumping and edge cases; otherwise, why hasn't someone done it already? GStreamer apparently supports WebRTC, so I'm likely going to look there next.

Why not send the original video file instead?

Don't have any as I'm getting everything from video streams on the web. All browsers will not allow cross-domain communication, and most don't support HLS streams out of the box, meaning I need to bring in external dependencies. I certainly will lose the energy to watch a show if I have to wait for my computer to download and transcode the episode I want to watch.

the first thing would be to eliminate the splitting and re-merging of the image buffer

Something like this?

    def screenshot_raw(self, includes='subtitles'):
        """Mapped mpv screenshot_raw command, see man mpv(1). Returns a pillow Image object."""
        from PIL import Image
        res = self.node_command('screenshot-raw', includes)
        if res['format'] != 'bgr0':
            raise ValueError('Screenshot in unknown format "{}". Currently, only bgr0 is supported.'
                    .format(res['format']))
        img = Image.frombytes('RGBA', (res['stride']//4, res['h']), res['data'])
        return img
        # b,g,r,a = img.split()
        # return Image.merge('RGB', (r,g,b))

I'm curious about why you take out the alpha channel in the first place.

you'd want to hand over the raw pointer from libmpv directly to your webrtc lib without copying anything, and make sure that both use the same pixel data format in the first place

That's not going to work. WebRTC sends video exclusively over the VP8 and VP9 video codecs. I reached for screenshot_raw as PyAV, the video encoding library aiortc uses, has built-in support for Pillow.

jaseg / python-mpv

How to use render context API with NumPy/OpenCV? #198