Shader stage expansion for temporal shaders/algorithms?

bloc97 commented 4 years ago

Hello, I have been wondering if it was possible to expand the mpv video output shader stage with the following features? I know this might not be on the devs' main priority, but if I were to try to implement them, where should I start?

These are ordered by immediate usefulness.

Expose the MPEG motion vectors as a source plane. (eg. a new MOTION source hook instead of just LUMA and CHROMA)
Allowing a shader to read previous or future source hooks. (eg. accessing the LUMA of the current frame and the previous frame)
Allow textures to stay alive across different frames. (eg. accessing a texture saved from the previous video frames)
Allow writing to 3D textures. (in the form of a virtual 3D hook?)

(1) can allow the development more complex shaders which take in account motion. (1, 2) can allow the implementation of MISR algorithms (multiple image super resolution) for video. (1, 2, 3) can allow the implementation of temporal algorithms (eg. Unity's TSS) and RNNs. (1, 2, 3, 4) can allow the implementation of very efficient Convolutional RNNs.

Furthermore, by implementing (2) alone and allowing people or the shader to choose a target framerate can allow the implementation of custom temporal interpolation algorithms. (--tscale) This can be useful for implementing frame interpolation algorithms such as DAIN.

mpv is already a great player thanks to its open shader stage. I believe adding these features will make mpv unique and bridge the gap between algorithms developed in academia and algorithms used in practice.

Thanks in advance.

Edit: I meant (4) writing to 3D textures. I know that mpv already supports read only 3D textures.

haasn commented 4 years ago

Quick summary since I'm still away from home,

On Fri, 02 Oct 2020 13:13:51 -0700, bloc97 notifications@github.com wrote:

Expose the MPEG motion vectors as a source plane. (eg. a new MOTION source hook instead of just LUMA and CHROMA)

This is doable via AVFrame side data. I have ancient branches exposing this data to the shaders, but in my limited testing they were fairly useless because they're prediction vectors, not motion vectors. Little effort is given by encoders to give them useful semantic meaning beyond "minimizing bitrate", and their algorithms differ wildly, as does their availability in general (cf. other codecs).

Allowing a shader to read previous or future source hooks. (eg. accessing the LUMA of the current frame and the previous frame)

This could be done for the interpolation stage hooks (e.g. OUTPUT, or a new hook we introduce) with relative ease. Doing it for any earlier stage would be a more complicated endeavor, for which the implementation probably overlaps with point 3 (which is close to being a generalization of this request).

Allow textures to stay alive across different frames. (eg. accessing a texture saved from the previous video frames)

Doable and useful. I also had some vague ideas to introduce STORAGE blocks (textures/buffers), for storing persistent data. But storing textures persistently is also a good idea. Implementation-wise, textures which are needed across frames would have to be stored in struct ra_fbotex (or whatever).

Allow 3D textures.

Trivial.

These are ordered by immediate usefulness.

(1) can allow the development more complex shaders which take in account motion. (1, 2) can allow the implementation of MISR algorithms (multiple image super resolution) for video. (1, 2, 3) can allow the implementation of temporal algorithms (eg. Unity's TSS) and RNNs. (1, 2, 3, 4) can allow the implementation of very efficient Convolutional RNNs.

Furthermore, by implementing (2) alone and allowing people or the shader to choose a target framerate can allow the implementation of custom temporal interpolation algorithms. (--tscale) This can be useful for implementing frame interpolation algorithms such as DAIN.

mpv is already a great player thanks to its open shader stage. I believe adding these features will make mpv unique and bridge the gap between algorithms developed in academia and algorithms used in practice.

Thanks in advance.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/mpv-player/mpv/issues/8137Non-text part: text/html

bloc97 commented 4 years ago

Thanks for your quick reply! I have a few more thoughts. For 1, I believe you are right that the different algorithms used for the prediction reduces its usefulness. However, I think the macroblock predictions offers a coarse and noisy, but mostly correct motion vectors with respect to edges and salient features (which the compression algorithm is designed to maximize). For the purposes of super-resolution, denoising or frame estimation, this might be a good starting point. Pairing this coarse estimation with a better optical flow estimation implemented as a shader can be extremely useful.

3 is indeed the most useful of them all in the long run.

For 4, I believe the e-mail did not show you my edit. I was talking about saving to 3D textures when hooking to a 2D hook. For example, the shader hooks to LUMA, but saves to a 3D texture of size LUMA.w LUMA.h depth. Consequently, HOOKED_pos should be a vec3. Is this very trivial?

Finally, do you think a good way of allowing custom frame interpolations would be just to expose interpolation stage hooks? Some algorithms can only double the frame rate (eg. 24->48 fps), but then it still needs an more traditional interpolation algorithm to match the display refresh rate. (Should be handled by mpv.)

Thanks again,

haasn commented 4 years ago

For 4, I believe the e-mail did not show you my edit. I was talking about saving to 3D textures when hooking to a 2D hook. For example, the shader hooks to LUMA, but saves to a 3D texture of size LUMA.w LUMA.h depth. Consequently, HOOKED_pos should be a vec3. Is this very trivial?

Ah. Trivially impossible, since you can't render to 3D textures. (You could use multiple framebuffer attachments though, e.g. MRT, which would be non-trivial but possible)

For this purpose I think it would probably be better to introduce persistent storage images (which can be 3D) and use compute shaders to interact with them (more or less treating them as generic memory blocks at that point).

bloc97 commented 3 years ago

If I am not mistaken, textures/images and SSBOs should stay persistent if not modified from one frame to the next?

I've given this problem some thought and if the statement above is correct, I think the simplest stop-gap solution would be to allow user-defined read-write buffers (SSBOs) and textures in shader code.

Right now there is already user defined read-only textures, and allowing read-write textures/images can let people use imageStore and imageLoad arbitrarily in shader code. The code for SSBOs is also already present in ra_buf_create.

A side (but great) benefit of this would allow custom pipelining in shader code. Modern GPUs have a very deep pipeline and I have noticed that the shaders I wrote for Anime4K tend to stall, especially when there are many small shader stages. It would be very beneficial to combine them into a single compute shader stage using SSBOs, effectively taking advantage of the deep pipeline of GPUs.

Which parts of the code/files should I start looking at to fully implement this? It's easy to get lost in the source without knowing the parts...

haasn commented 3 years ago

@bloc97 This has been implemented for a while in libplacebo for both storage images and storage/uniform buffers.

I am working on vo_placebo which will bring these improvements to mpv directly, but in the meantime, you could test them using the plplay tool and report any issues.

It should also be more or less straightforward to port the relevant parts of the libplacebo code to mpv's custom shader parser as a stopgap measure.

bloc97 commented 3 years ago

That's great news, thanks for the work on libplacebo! I've not been able to test shader performance using the plplay demo as I couldn't get it compiled on windows and the windows linux subsystem v2 does not yet support native OpenGL/Vulkan context passthrough, it's currently using CPU emulation (performance is horrible). Is it even possible to compile it on windows or am I doing it wrong?

haasn commented 3 years ago

It should be possible to compile it using MinGW. (Though you'll also need MinGW-compiled versions of dependencies)

@rossy is currently working on libplacebo+d3d11 so he may be able to offer insight on how to compile it.

rossy commented 3 years ago

I use MSYS2 to build it, because it has a binary package manager that makes it easy. You should be able to compile libplacebo by installing MSYS2, opening a mingw-w64 shell from the start menu and installing the following packages:

pacman -S $MINGW_PACKAGE_PREFIX-{toolchain,libplacebo,meson,ninja,python,python-mako,vulkan-headers,glfw}

After that you should be able to build libplacebo and the demos from the same mingw-w64 shell.

bloc97 commented 3 years ago

pacman -S $MINGW_PACKAGE_PREFIX-{toolchain,libplacebo,meson,ninja,python,python-mako,vulkan-headers,glfw}

Works great and thanks! For future people that use this to compile on MinGW/MSYS2, installing the packages git, ffmpeg and cmake will take care of the warnings. Also if you are developing on vulkan using VS or have installed it for game engines like unity, you also have to edit libplacebo\src\vulkan\utils_gen.py and remove the registry paths that can point to another version that is not in MinGW.

    registry_paths = [
        '%VULKAN_SDK%/share/vulkan/registry/vk.xml',
        '$VULKAN_SDK/share/vulkan/registry/vk.xml',
        '$MINGW_PREFIX/share/vulkan/registry/vk.xml',
        '/usr/share/vulkan/registry/vk.xml',
    ]

Remove everything except for '$MINGW_PREFIX/share/vulkan/registry/vk.xml'.

bloc97 commented 3 years ago

I've just tested libplacebo with plplay, and the performance is insane! I think something is bottlenecking the shaders in mpv. I can run extremely large CNNs without frame drops, with performance very close to a naive CUDA/cuBLAS implementation. Even FSRCNNX_x2_56-16-4-1.glsl runs 6x faster than realtime.

With this I think it would be worthwhile to develop a general purpose Tensorflow Model->GLSL converter so deployment of deep learning methods can be fast and easy. I already have a prototype that I used, but it needs a lot more work to be useful in general cases. But I'm getting sidetracked here...

Anyways great work! And I can't wait for this to be introduced to mpv, it will certainly revolutionize real time video processing as nothing even comes close to the usability, scalability and integrability of libplacebo/mpv.

haasn commented 3 years ago

That's actually somewhat unexpected to say the least. Are you sure the shaders are actually being applied? (Unless the performance gain is due to the use of new features)

bloc97 commented 3 years ago

Yes, I've stacked four Anime4K_UL shaders on a 360p video for 4K upscaling and it works flawlessly. I'm not sure if all four were applied, but at least 2 as I can notice differences until the third. On mpv applying more than 1 UL shader would make everything freeze and unusable until you killed the process. Even initializing the shader was exceptionally fast, as it used to take 4-5 seconds on mpv, now it is instant.

I don't know why mpv was much slower but it might be because of something else like synchronization? Or maybe mpv was not compiled specifically for my machine? (Would that make a difference...?)

The even better news is that my shaders are (embarrassingly) very unoptimized, they do naive convolutions one step at a time, with a lot of overlap (duplicate texture reads). With the use of SSBOs we can implement highly efficient sparse matrix multiplications which can speed it up 2-3x times depending on the neural network.

haasn commented 3 years ago

Are you sure you were using the vulkan backend when testing with mpv?

igv commented 3 years ago

@bloc97 Increase SC_MAX_ENTRIES and all those shaders will be usable in real time with mpv too.

bloc97 commented 3 years ago

I see, thanks for pointing that out! Is there a reason for SC_MAX_ENTRIES to exist as a hard coded define? I don't think it is documented in the mpv manual.

haasn commented 3 years ago

Not really, no. Left over relic from a simpler time, when "48 entries ought to be enough for everyone".

Edit: I suppose the real reason is to prevent the shader cache from growing too large, because mpv looks up entries in the cache by doing a full strcmp on the shader body for each entry. In libplacebo that consideration is completely removed because it looks up entries using a hash of the body instead. Ironically, libplacebo still has a limit on the cache, to prevent the cache file from growing infinitely - but the libplacebo cache is adaptive in size and only evicts entries that haven't been used recently.

Anyway, I'll bump up the limit, since a few extra strcmp shouldn't be a reason to prevent your use case altogether.

bloc97 commented 3 years ago

That would explain the 4-5 seconds load time... If I understand correctly mpv's shader cache is not modified when shaders are "removed" by the user, but only cleared when the user cumulatively loads in more than SC_MAX_ENTRIES, and assuming that the new shaders are not larger than SC_MAX_ENTRIES, it clears the cache and loads in the new shaders.

How big are the entries in the shader cache? If they are small would setting the max limit to something absurd like 65535 be detrimental to performance?

Edit: I've read 474ee003eddd278d871e7c1c760091b739880e8f, and I guess since the current shader library will be deprecated by libplacebo, I will wait for it. I guess further problems that I encounter with libplacebo (if they exist) can be discussed over there.

Doofussy2 commented 3 years ago

I am working on vo_placebo which will bring these improvements to mpv directly

and

@rossy is currently working on libplacebo+d3d11

That's exciting!

YellowOnion commented 7 months ago

Any updates on this idea? was hoping to do some temporal filters... temporal is actually insane for some sorts of filters (like denoisers), I noticed a nlmeans filter has "experimental" temporal support, which seems to imply there's some support for temporal features?

Also please considering adding a swap-buffer feature, I remember trying to do some stuff with ShaderFX and adding a nop-BILT stage was extremely slow that could have been avoided by swapping the input and output buffer.

YellowOnion commented 7 months ago

after 3 hours of experimentation, I got a basic BLIT-less temporal blur setup, but it leaves lots to be desired:

//!HOOK MAIN
//!BIND HOOKED
//!BIND PREV
//!DESC temporal-blur

#define T_DEPTH 2
#define toVec4(a) vec4(a.r, a.g, a.b, 0)
#define HOOKED_posOff(d) ivec3(HOOKED_pos*HOOKED_size, d)

vec4 hook() {

  vec3 a = HOOKED_texOff(0).rgb;
  vec3 b = imageLoad(PREV, HOOKED_posOff(0)).rgb;
  vec3 c = imageLoad(PREV, HOOKED_posOff(1)).rgb;
  imageStore(PREV, HOOKED_posOff(frame % T_DEPTH), toVec4(a));
  vec3 avg = (a + b + c) / 3;
  return toVec4(avg);
}

//!TEXTURE PREV
//!SIZE 2048 2048 2
//!FORMAT rgb16
//!STORAGE

my issues with this are: The PREV texture has a hard coded size, I want it to be sized to the input content. !STORAGE flag is not documented. It's screen ordering, not file ordering meaning if you launch mpv with --paused you will have "bad" quality preferably seeking should be handled better for the most "correct" output, i.e. need better scene change detection to handle seek boundries.

I think I can avoid this using !SAVE but that is limited to 2D textures, making it harder to avoid a BLIT (maybe I'm missing something)

There's no way to peek in to the future by the looks, meaning you have to induce frame delay with no way to signal back to MPV that this needs to be corrected for.

Maybe I'm missing some obvious stuff, but documentation is pretty sparse.

mpv-player / mpv

Shader stage expansion for temporal shaders/algorithms? #8137

Allow writing to 3D textures. (in the form of a virtual 3D hook?)