Open bloc97 opened 4 years ago
Quick summary since I'm still away from home,
On Fri, 02 Oct 2020 13:13:51 -0700, bloc97 notifications@github.com wrote:
- Expose the MPEG motion vectors as a source plane. (eg. a new MOTION source hook instead of just LUMA and CHROMA)
This is doable via AVFrame side data. I have ancient branches exposing this data to the shaders, but in my limited testing they were fairly useless because they're prediction vectors, not motion vectors. Little effort is given by encoders to give them useful semantic meaning beyond "minimizing bitrate", and their algorithms differ wildly, as does their availability in general (cf. other codecs).
- Allowing a shader to read previous or future source hooks. (eg. accessing the LUMA of the current frame and the previous frame)
This could be done for the interpolation stage hooks (e.g. OUTPUT, or a new hook we introduce) with relative ease. Doing it for any earlier stage would be a more complicated endeavor, for which the implementation probably overlaps with point 3 (which is close to being a generalization of this request).
- Allow textures to stay alive across different frames. (eg. accessing a texture saved from the previous video frames)
Doable and useful. I also had some vague ideas to introduce STORAGE blocks (textures/buffers), for storing persistent data. But storing textures persistently is also a good idea. Implementation-wise, textures which are needed across frames would have to be stored in struct ra_fbotex (or whatever).
- Allow 3D textures.
Trivial.
These are ordered by immediate usefulness.
(1) can allow the development more complex shaders which take in account motion. (1, 2) can allow the implementation of MISR algorithms (multiple image super resolution) for video. (1, 2, 3) can allow the implementation of temporal algorithms (eg. Unity's TSS) and RNNs. (1, 2, 3, 4) can allow the implementation of very efficient Convolutional RNNs.
Furthermore, by implementing (2) alone and allowing people or the shader to choose a target framerate can allow the implementation of custom temporal interpolation algorithms. (--tscale) This can be useful for implementing frame interpolation algorithms such as DAIN.
mpv is already a great player thanks to its open shader stage. I believe adding these features will make mpv unique and bridge the gap between algorithms developed in academia and algorithms used in practice.
Thanks in advance.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/mpv-player/mpv/issues/8137Non-text part: text/html
Thanks for your quick reply! I have a few more thoughts. For 1, I believe you are right that the different algorithms used for the prediction reduces its usefulness. However, I think the macroblock predictions offers a coarse and noisy, but mostly correct motion vectors with respect to edges and salient features (which the compression algorithm is designed to maximize). For the purposes of super-resolution, denoising or frame estimation, this might be a good starting point. Pairing this coarse estimation with a better optical flow estimation implemented as a shader can be extremely useful.
3 is indeed the most useful of them all in the long run.
For 4, I believe the e-mail did not show you my edit. I was talking about saving to 3D textures when hooking to a 2D hook. For example, the shader hooks to LUMA, but saves to a 3D texture of size LUMA.w LUMA.h depth. Consequently, HOOKED_pos should be a vec3. Is this very trivial?
Finally, do you think a good way of allowing custom frame interpolations would be just to expose interpolation stage hooks? Some algorithms can only double the frame rate (eg. 24->48 fps), but then it still needs an more traditional interpolation algorithm to match the display refresh rate. (Should be handled by mpv.)
Thanks again,
For 4, I believe the e-mail did not show you my edit. I was talking about saving to 3D textures when hooking to a 2D hook. For example, the shader hooks to LUMA, but saves to a 3D texture of size LUMA.w LUMA.h depth. Consequently, HOOKED_pos should be a vec3. Is this very trivial?
Ah. Trivially impossible, since you can't render to 3D textures. (You could use multiple framebuffer attachments though, e.g. MRT, which would be non-trivial but possible)
For this purpose I think it would probably be better to introduce persistent storage images (which can be 3D) and use compute shaders to interact with them (more or less treating them as generic memory blocks at that point).
If I am not mistaken, textures/images and SSBOs should stay persistent if not modified from one frame to the next?
I've given this problem some thought and if the statement above is correct, I think the simplest stop-gap solution would be to allow user-defined read-write buffers (SSBOs) and textures in shader code.
Right now there is already user defined read-only textures, and allowing read-write textures/images can let people use imageStore
and imageLoad
arbitrarily in shader code. The code for SSBOs is also already present in ra_buf_create
.
A side (but great) benefit of this would allow custom pipelining in shader code. Modern GPUs have a very deep pipeline and I have noticed that the shaders I wrote for Anime4K tend to stall, especially when there are many small shader stages. It would be very beneficial to combine them into a single compute shader stage using SSBOs, effectively taking advantage of the deep pipeline of GPUs.
Which parts of the code/files should I start looking at to fully implement this? It's easy to get lost in the source without knowing the parts...
@bloc97 This has been implemented for a while in libplacebo for both storage images and storage/uniform buffers.
I am working on vo_placebo
which will bring these improvements to mpv directly, but in the meantime, you could test them using the plplay
tool and report any issues.
It should also be more or less straightforward to port the relevant parts of the libplacebo code to mpv's custom shader parser as a stopgap measure.
That's great news, thanks for the work on libplacebo! I've not been able to test shader performance using the plplay demo as I couldn't get it compiled on windows and the windows linux subsystem v2 does not yet support native OpenGL/Vulkan context passthrough, it's currently using CPU emulation (performance is horrible). Is it even possible to compile it on windows or am I doing it wrong?
It should be possible to compile it using MinGW. (Though you'll also need MinGW-compiled versions of dependencies)
@rossy is currently working on libplacebo+d3d11 so he may be able to offer insight on how to compile it.
I use MSYS2 to build it, because it has a binary package manager that makes it easy. You should be able to compile libplacebo by installing MSYS2, opening a mingw-w64 shell from the start menu and installing the following packages:
pacman -S $MINGW_PACKAGE_PREFIX-{toolchain,libplacebo,meson,ninja,python,python-mako,vulkan-headers,glfw}
After that you should be able to build libplacebo and the demos from the same mingw-w64 shell.
pacman -S $MINGW_PACKAGE_PREFIX-{toolchain,libplacebo,meson,ninja,python,python-mako,vulkan-headers,glfw}
Works great and thanks! For future people that use this to compile on MinGW/MSYS2, installing the packages git
, ffmpeg
and cmake
will take care of the warnings. Also if you are developing on vulkan using VS or have installed it for game engines like unity, you also have to edit libplacebo\src\vulkan\utils_gen.py
and remove the registry paths that can point to another version that is not in MinGW.
registry_paths = [
'%VULKAN_SDK%/share/vulkan/registry/vk.xml',
'$VULKAN_SDK/share/vulkan/registry/vk.xml',
'$MINGW_PREFIX/share/vulkan/registry/vk.xml',
'/usr/share/vulkan/registry/vk.xml',
]
Remove everything except for '$MINGW_PREFIX/share/vulkan/registry/vk.xml'
.
I've just tested libplacebo with plplay, and the performance is insane! I think something is bottlenecking the shaders in mpv. I can run extremely large CNNs without frame drops, with performance very close to a naive CUDA/cuBLAS implementation. Even FSRCNNX_x2_56-16-4-1.glsl
runs 6x faster than realtime.
With this I think it would be worthwhile to develop a general purpose Tensorflow Model->GLSL converter so deployment of deep learning methods can be fast and easy. I already have a prototype that I used, but it needs a lot more work to be useful in general cases. But I'm getting sidetracked here...
Anyways great work! And I can't wait for this to be introduced to mpv, it will certainly revolutionize real time video processing as nothing even comes close to the usability, scalability and integrability of libplacebo/mpv.
That's actually somewhat unexpected to say the least. Are you sure the shaders are actually being applied? (Unless the performance gain is due to the use of new features)
Yes, I've stacked four Anime4K_UL shaders on a 360p video for 4K upscaling and it works flawlessly. I'm not sure if all four were applied, but at least 2 as I can notice differences until the third. On mpv applying more than 1 UL shader would make everything freeze and unusable until you killed the process. Even initializing the shader was exceptionally fast, as it used to take 4-5 seconds on mpv, now it is instant.
I don't know why mpv was much slower but it might be because of something else like synchronization? Or maybe mpv was not compiled specifically for my machine? (Would that make a difference...?)
The even better news is that my shaders are (embarrassingly) very unoptimized, they do naive convolutions one step at a time, with a lot of overlap (duplicate texture reads). With the use of SSBOs we can implement highly efficient sparse matrix multiplications which can speed it up 2-3x times depending on the neural network.
Are you sure you were using the vulkan
backend when testing with mpv?
@bloc97 Increase SC_MAX_ENTRIES and all those shaders will be usable in real time with mpv too.
I see, thanks for pointing that out!
Is there a reason for SC_MAX_ENTRIES
to exist as a hard coded define? I don't think it is documented in the mpv manual.
Not really, no. Left over relic from a simpler time, when "48 entries ought to be enough for everyone".
Edit: I suppose the real reason is to prevent the shader cache from growing too large, because mpv looks up entries in the cache by doing a full strcmp
on the shader body for each entry. In libplacebo that consideration is completely removed because it looks up entries using a hash of the body instead. Ironically, libplacebo still has a limit on the cache, to prevent the cache file from growing infinitely - but the libplacebo cache is adaptive in size and only evicts entries that haven't been used recently.
Anyway, I'll bump up the limit, since a few extra strcmp
shouldn't be a reason to prevent your use case altogether.
That would explain the 4-5 seconds load time... If I understand correctly mpv's shader cache is not modified when shaders are "removed" by the user, but only cleared when the user cumulatively loads in more than SC_MAX_ENTRIES
, and assuming that the new shaders are not larger than SC_MAX_ENTRIES
, it clears the cache and loads in the new shaders.
How big are the entries in the shader cache? If they are small would setting the max limit to something absurd like 65535 be detrimental to performance?
Edit: I've read 474ee003eddd278d871e7c1c760091b739880e8f, and I guess since the current shader library will be deprecated by libplacebo, I will wait for it. I guess further problems that I encounter with libplacebo (if they exist) can be discussed over there.
I am working on vo_placebo which will bring these improvements to mpv directly
and
@rossy is currently working on libplacebo+d3d11
That's exciting!
Any updates on this idea? was hoping to do some temporal filters... temporal is actually insane for some sorts of filters (like denoisers), I noticed a nlmeans filter has "experimental" temporal support, which seems to imply there's some support for temporal features?
Also please considering adding a swap-buffer feature, I remember trying to do some stuff with ShaderFX and adding a nop-BILT stage was extremely slow that could have been avoided by swapping the input and output buffer.
after 3 hours of experimentation, I got a basic BLIT-less temporal blur setup, but it leaves lots to be desired:
//!HOOK MAIN
//!BIND HOOKED
//!BIND PREV
//!DESC temporal-blur
#define T_DEPTH 2
#define toVec4(a) vec4(a.r, a.g, a.b, 0)
#define HOOKED_posOff(d) ivec3(HOOKED_pos*HOOKED_size, d)
vec4 hook() {
vec3 a = HOOKED_texOff(0).rgb;
vec3 b = imageLoad(PREV, HOOKED_posOff(0)).rgb;
vec3 c = imageLoad(PREV, HOOKED_posOff(1)).rgb;
imageStore(PREV, HOOKED_posOff(frame % T_DEPTH), toVec4(a));
vec3 avg = (a + b + c) / 3;
return toVec4(avg);
}
//!TEXTURE PREV
//!SIZE 2048 2048 2
//!FORMAT rgb16
//!STORAGE
my issues with this are:
The PREV
texture has a hard coded size, I want it to be sized to the input content.
!STORAGE
flag is not documented.
It's screen ordering, not file ordering meaning if you launch mpv
with --paused
you will have "bad" quality preferably seeking should be handled better for the most "correct" output, i.e. need better scene change detection to handle seek boundries.
I think I can avoid this using !SAVE
but that is limited to 2D textures, making it harder to avoid a BLIT (maybe I'm missing something)
There's no way to peek in to the future by the looks, meaning you have to induce frame delay with no way to signal back to MPV that this needs to be corrected for.
Maybe I'm missing some obvious stuff, but documentation is pretty sparse.
Hello, I have been wondering if it was possible to expand the mpv video output shader stage with the following features? I know this might not be on the devs' main priority, but if I were to try to implement them, where should I start?
These are ordered by immediate usefulness.
Allow writing to 3D textures. (in the form of a virtual 3D hook?)
(1) can allow the development more complex shaders which take in account motion. (1, 2) can allow the implementation of MISR algorithms (multiple image super resolution) for video. (1, 2, 3) can allow the implementation of temporal algorithms (eg. Unity's TSS) and RNNs. (1, 2, 3, 4) can allow the implementation of very efficient Convolutional RNNs.
Furthermore, by implementing (2) alone and allowing people or the shader to choose a target framerate can allow the implementation of custom temporal interpolation algorithms. (--tscale) This can be useful for implementing frame interpolation algorithms such as DAIN.
mpv is already a great player thanks to its open shader stage. I believe adding these features will make mpv unique and bridge the gap between algorithms developed in academia and algorithms used in practice.
Thanks in advance.
Edit: I meant (4) writing to 3D textures. I know that mpv already supports read only 3D textures.