Video/audio streaming APIs

hsdk123 commented 5 years ago

Hi, video seems to be a big part of media effects lately, and I was hoping to propose an official implementation with a simple backend.

I noticed that there was an FFmpeg backend in the works, but something more simple might be

a. Libtheora + theoraplay: https://icculus.org/theoraplay/

b. plmpeg https://github.com/phoboslab/pl_mpeg

Instead of supporting multiple different codecs, I think a single portable format would suffice.

mosra commented 5 years ago

Hi, sorry for the late reply -- regarding video, more than about particular format support / plugin implementations I'm concerned about designing the API to be performant enough. I imagine decoding each (1080p, 4K, ...) frame on a CPU and then uploading it to a GPU is not the right way to do things. Got any experience with that? :) What I'm not sure about is:

how flexible the video API needs to be (is providing a sequential way to get one frame after another enough)?
OTOH providing an overly generic API where you ask for "frame 28068" will probably be a hell to implement on the plugin side, since each frame depends on tens/hundreds(?) of previous frames
some memory reusing / double-buffering (decoding into one buffer, allowing the app to read from another), similarly to how V4L does it

Looking around, there seem to be a few things worth considering / watching:

ffmpeg can use the GPU to decode videos (and then it's needed to figure out how to expose the data to GL/Vulkan/... without a CPU rountrip)
there's a new WebGL extension for a "video texture", presented at SIGGRAPH 2019: WEBGL_video_texture -- right now "DO NOT IMPLEMENT", but worth keeping an eye on
the WebGL extension is based off OES_EGL_image_external, is there anything similar for desktop systems?
Binomial's Basis is working on video support

hsdk123 commented 5 years ago

I think performance and flexibility usually come on opposite sides of the spectrum, but I think in terms of practical flexibility, it's difficult to want more than a video editing library, so I'll base my opinions from some experiences a while back when working to build one.

In terms of providing a flexible API for developers to use, I've found that it usually boils down to

a. exposing a method to sequentially get pixel information of frames b. + the time point of the frame.

Allowing the developer to manually modify the pixel information based on that time, is usually sufficient enough, with people managing to do some crazy things, by ex. adding multiple 'filters', chaining transformations of pixels to get various types of summed effects.

I personally don't think there's much value in exposing something where you can get ex. the nth frame information. Videos are usually about sequential displaying thus practically speaking you want to get frames in context of the stream.

Now of course, if you want to build a fully developed video editor / player that's used by an end-user, you're going to want to move between different time points, but I feel that's a much more niche case, when the much larger majority usually just want to play the video on screen - this being a very usual request for cutscenes, etc.

In terms of history, a lot of game APIs in terms of video used to provide a 'video object', one that developers can change colour, position, size, etc.

Unity, etc. seem to have now abstracted that concept to the idea of a video texture, and I personally agree with this abstraction. It hides all possible GPU intensive decoding, etc. behind the implementations, and aligns with the more widespread idea of 'locking' pixels, as in 2d textures - this being a more strenuous operation that developers have come to understand involve moving information from the GPU down to the CPU.

This further allows regular texture manipulation through shaders, etc. and I feel is thus the more natural, yet flexible abstraction.

(I've only worked on mainly developer APIs, so sadly GPU decoding is out of my expertise)

mosra commented 5 years ago

want to move between different time points, but I feel that's a much more niche case, when the much larger majority usually just want to play the video on screen

Thanks for confirming that. My experience in this area is very sparse, so I don't know what's expected :)

Unity, etc. seem to have now abstracted that concept to the idea of a video texture

I think exposing something similar (wrapping the two extensions I mentioned above) would make sense. But not sure about the rest yet. For ffmpeg specifically, I found some GPU-GPU operations in https://ffmpeg.org/doxygen/trunk/hwcontext_8c.html, but still missing the bigger picture.

To sum up -- this is definitely a feature that has its place here, ranging from importing animated PNGs/GIFs to full video. However I'm afraid I won't have time to look deeper into this in the next month or two, as my schedule is already packed with other features. For the time being, you'd need to implement it on your side -- and the way things currently are done, Magnum shouldn't step in your way I think. Of course let me know if there's any limitation that makes things harder than they should be.

hsdk123 commented 5 years ago

Roger, I think I might start something simple just on Windows with libtheora - I'll post any questions / blockers if any arise. Definitely looking forward to something more pervasive on magnum though! Wouldn't mind waiting until fall.

mosra commented 5 years ago

The Basis Universal image format added in 2019.10 has video support and in the following weeks I'm planning to update the importer/converter APIs to support (at least) sequential frame import/conversion. Not quite sure yet how the API will look, but what I want at least is the following, I'll leave seeking for later:

being able to import a frame without allocating a new memory block for each (some N-buffering)
let the importer decode a frame in parallel (give it back a used frame so it can decode the next one to it)

In case of Basis the video is decoded into one of the GPU block compression formats, for general video the PixelFormat (and related GL APIs) would need to be extended to understand YUV and such -- I suppose this would depend on vendor-specific GL extensions.

hsdk123 commented 5 years ago

Nice! Looking forward to this first step!

hsdk123 commented 5 years ago

Ah, just before I forget: sequential + repeat functionality, would cover most cases! Really looking forward to this!

mosra commented 4 years ago

With texture transformation support done in 9a06b3515bc5a0960343a86acd289aa4d62b6c57, GIF support done implemented in mosra/magnum-plugins@595baf5a558187883456872b9c906f0f8429b2dd and video support that's done in BasisImporter since the very beginning, there's now a very crude possibility to play back short videos -- see the Animated GIF example.

Next step is a proper streaming API for both video and audio, so one doesn't need to have everything resident in memory.

hsdk123 commented 4 years ago

For streaming audio, I think a reference example would be SFML and sf::Music. SFML also uses open al for its backend so it might make for an easier reference.

https://www.sfml-dev.org/tutorials/2.5/audio-sounds.php https://github.com/SFML/SFML/tree/master/src/SFML/Audio

For streaming video, along the lines of SFML, there's

https://github.com/Yalir/sfeMovie (backend: ffmpeg),

and there's also theoraplay

http://hg.icculus.org/icculus/theoraplay/file/tip/test/sdltheoraplay.c

mosra / magnum

Video/audio streaming APIs #360