zachsaw / MPDN_Extensions

Media Player .Net (MPDN) Open Source Extensions
Other
59 stars 18 forks source link

Implement RAVU as renderscript #239

Open bjin opened 7 years ago

bjin commented 7 years ago

I recently implemented a prescaler for mpv named RAVU, based on RAISR (Rapid and Accurate Image Super Resolution). After several iteration of improvement/performance tuning, I consider the current code feature complete and somehow stable (no major change to shader planned, probably only improvement of model weights). I also got report recently that, RAVU works fine with the current work-in-progress native Direct3D 11 renderer of mpv (HLSL cross compiled by GLSL->SPIRV->HLSL). So I now believe it's a viable option to have RAVU ported as MPDN's renderscript.

Currently, RAVU is tuned for anime only. The linear regression method it used is just too simple to fit both anime-style picture and live action photos. I also tried to train RAVU on real photos but the result (validated with independent selection of real photos) is not impressive, just slightly better than EWA scalers visually. You can find comparison on anime-style pictures of NNEDI3 and RAVU here. However, performance-wise, RAVU with radius=3 is like 5 times faster than NNEDI3 with neurons=32, see details here.

I will explain the basics about how RAVU works, and if anyone is interested, I could explain more details.

  1. Sample n * n pixels in neighborhood (n=2*r for ravu, and n=2*r-1 for ravu-lite).
  2. Use luma channel of these pixels to calculate local gradient (gradient of each pixel weight summed with a gaussian kernel).
  3. Use local gradient to calculate a discrete key (angle, strength, coherence)
  4. Query a LUT (could also be a 2d vec4/float4 array) to obtain a convolution kernel.
  5. Apply the convolution kernel to n * n pixels and get the result.

However, there are several variants of RAVU to fit different scenarios. I think luma and rgb variant of RAVU, and RAVU-lite is most interesting.

  1. The most common one is a luma only prescaler, which upscales only luma plane.
  2. -yuv, use the first channel (luma channel) to calculate gradient, and upscale all three planes.
  3. -rgb, calculate luma channel from rgb channels to calculate gradient, and upscale all three planes (most universal since all video will be converted to RGB in mpv before upscaling).
  4. -chroma, sample luma channel separately from source plane (with bilinear texture sampling), and upscale chroma planes.
  5. RAVU-lite is luma-only, it obtains four convolution kernels for four target positions in one pass. It use odd kernel size and introduces no pixel shifts. Its quality is worse than RAVU with same radius setting, due to smaller kernel size.
  6. Regular RAVU can upscale different type of planes as mentioned above. It uses interpolation method similar to Super-xBR. It uses three passes, and in each pass it will query only one convolution kernel. It use even kernel size and introduces half-pixel pixel shifts.
  7. There is also gather version utilizing textureGatherOffset (GatherRed in HLSL) and compute version utilizing compute shader (DirectCompute) for further performance improvement.

I also cross compiled some sample shaders into HLSL for reference purpose. The (unrolled) coordinates are currently generated by python script.

ravu-lite-r3: pass1 (GLSL HLSL) combine_pass(GLSL HLSL) ravu-r3: pass1 (GLSL HLSL) pass2 (GLSL HLSL) pass3 (GLSL HLSL) combine_pass (GLSL HLSL)

EDIT: I don't have environment/knowledge to develop HLSL shader and C# script for MPDN. But I'm happy to provide help to make porting easier.

Shiandow commented 7 years ago

Hi bjin,

Thanks for reaching out. The RAVU algorithm looks interesting. I'd definitely be interested in trying to get it to work with MPDN. The results on that comparison picture looked quite good.

By the way you said you made a chroma version that looked at the interpolated luma channel. Have you tried one that looks at the full resolution luma and uses that to scale the chroma? (you'll probably need 4(?) different sets of weights to handle all cases).

I'm also intrigued by the SPIR-V cross compiling, would it in theory work for all MPV user shaders? Does it work both ways?

Unfortunately I'm in the middle of some changes to the extension framework at the moment, so it could take a while before I get round to actually implementing RAVU.

Cheers, Shiandow

bjin commented 7 years ago

Have you tried one that looks at the full resolution luma and uses that to scale the chroma? (you'll probably need 4(?) different sets of weights to handle all cases).

The quality of luma information (for ravu-chroma) is not that important for current model. It only matters on calculation of the discrete key (angle, strength, coherence).

We could modify the model and makes chroma channel directly depending on luma channel, but I don't think simple linear model would work here. The semantic of luma and chroma are quite different. However, a deep NN model could probably handle this well.

I'm also intrigued by the SPIR-V cross compiling, would it in theory work for all MPV user shaders?

The GLSL->SPIRV part is actually quite essential. It's the reference implementation of SPIRV compiler. The SPIRV->HLSL part, on the other hand, is experimental. I think in theory it should be enough to cover the functionality set mpv uses (targeting d3d11 or d3d12) though, including UBO/SSBO and compute shader.

Does it work both ways?

I don't know. But glslang supports HLSL as well. So, could be.

Shiandow commented 7 years ago

The quality of luma information (for ravu-chroma) is not that important for current model. It only matters on calculation of the discrete key (angle, strength, coherence).

With my own experiments on luma guided chroma scaling I encountered a few cases where some features got lost when you downsampled the luma. This tended to cause some problems.

Anyway looks like I should look into SPIR-V some time. Although for now it's probably best to try to port RAVU first and see if that process can be streamlined a bit.