nvpro-samples / vk_video_samples

Vulkan video samples
Apache License 2.0
237 stars 40 forks source link

HLSL support in Vulkan Video samples #60

Closed BattleAxeVR closed 5 months ago

BattleAxeVR commented 7 months ago

Hi, I've been using Vulkan Video a while and I have a major issue with it, still, to this day.

There's still no way, as far as I can tell, to sample directly-decoded 420-format planar video textures in HLSL using immutable texture samplers, without first using GLSL / glslLangvalidator to compile the Spirv bytecode.

DXC has an issue here but many of us, I would say, even MOST game engines today use some variant of HLSL, instead of GLSL.

This code sample is 100% dependent on glslLangvalidator and GLSL, and the immutable sampler simply does not work in HLSL projects.

I refer to this:

https://github.com/nvpro-samples/vk_video_samples/issues/15

Now, while it may seem like a bug in DXC is "out of scope" to fix here, they have no interest in actually fixing it.

GLSL as a dependency for Vulkan Video is a serious liability for me. What it means is that I have to first decompress the textures by running an extra fullscreen pass, to a normal RGB / packed surface, and then sample that from anywhere else, ex path tracer or HLSL-based rasterization code.

If there are any projects using Vulkan Video that don't first sample and decompress the 420 video textures to an RGB render target / surface, then sample that, I would love to see it and then I'll close this issue immediately.

But until I see that, I don't think Vulkan Video is really ready for prime time.

I don't want to have to run an extra decompression pass in GLSL then sample the output of that, which has double the size (420 -> 444/RGB). This is a problem for my game project. So, what I would love to see here, is an HLSL shader used to sample and display the directly-decoded, non-copied, planar 420 textures. Is that something you'd consider doing? I may just have a bug in my code, but I can't for the life of me see it, and honestly, I simply doubt that these immutable textures work, at all, through any HLSL-based pipeline.

As you know, HLSL is the primary shader language used by most projects. There is simply no way I'm switching back to using GLSL everywhere in my path tracing engine after spending a lot of time migrating it from GLSL -> HLSL in the first place. I shouldn't have to do that, frankly.

BattleAxeVR commented 7 months ago

@natevm Nate suggested to me, over email, that I consider using slangd instead of HLSL, which is based on HLSL syntax, which I would consider using, but only if these issues can be fixed (eventually).

So I'd like to leave this issue open until there is a viable alternative to GLSL for Vulkan Video, as an extra decompression pass to convert to packed texture formats. It's not the shader language that's the issue, per se, it's the fact that I have to run an extra pass that doubles the memory consumption for zero benefit in image quality, and makes my entire engine more complicated.

I would like to know if there's even the remotest chance that this issue would get some level of attention at Nvidia to generate SPV output from slang directly, that actually bypasses the DXC compiled bytecode issue I'm having here.

As Nate knows, we both have a similar issue with DXC not copying out the ray payload in the anyhit shaders if the thread is terminated, which forces me to use stochastic transparency on what should otherwise be noise-free transparencies. (denoising stochastic transparency isn't supported by any realtime denoisers such as NRD or DLSS-RR)

https://github.com/microsoft/DirectXShaderCompiler/issues/5158

Until I see an actual media player that uses HLSL instead of GLSL to sample Vulkan Video textures for output to screen directly, I don't believe this actually works properly. I'm not sure who's actually responsible to fix it, but this is a real problem from my perspective. Not everyone out there is simply writing single-pass media players. I can't rewrite my entire engine's path tracing code to GLSL just to satisfy a dependency like this.

It doesn't matter if it takes six months to fix this, I just want to get the ball rolling and hope it can get added to some backlog somewhere. Unless / until I see an HLSL or slang shader used in this sample, or some media player, I remain convinced that GLSL -> glslLangvalidator are an actual dependency on Vulkan Video use, period.

natevm commented 7 months ago

I haven’t extensively looked into this issue, but if GLSL works, then one holdover solution might be to use the inline SPIRV assembly feature in slang: https://shader-slang.com/slang/user-guide/a1-04-interop.html

Could you make a minimal reproducer in godbolt that demonstrates the failure in DXC? It’s not clear what the issue actually is here… like, do you expect a particular SPIRV instruction to be emitted that isn’t? What’s the difference in the SPIRV generated by glslangvalidator versus by dxc / slang?

I can talk with the slang developers about the ray payload issues and in/out HLSL syntax.

BattleAxeVR commented 7 months ago

To illustrate my problem here. This is what I get when I use HLSL to decode the video:

image

This is the HLSL shader source (latest Vulkan Video code, FFMPEG libs integrated just 2 minutes ago, and latest DXC.exe from Vulkan SDK):

Texture2D video_image : register(t0); SamplerState video_sampler : register(s0);

float4 main(float2 input_uv : TEXCOORD0) : SV_TARGET { float2 uv = input_uv; float4 colour = video_image.Sample(video_sampler, uv); return colour; }

When I use this GLSL fragment shader instead (exactly the same C++ / engine code, just different SPV bytecode, differs only in the pixel shader)

version 460

layout(set = 0, binding = 0) uniform sampler2D colour_texture;

layout(location = 0) in vec2 outUV; layout(location = 0) out vec4 fragColor;

void main() { vec2 uv = outUV; vec4 colour = texture(colour_texture, uv); fragColor = colour; }

//Compile command line, works with YCbCr immutable sampler:

//glslangValidator.exe -g -o video.frag.h -V video.frag --vn video_frag --target-env vulkan1.2 //glslangValidator.exe video.frag -o video.frag.spv --target-env vulkan1.2

See, the full colour decoded from the 420 YUV planar texture outputted by the shader:

image

BattleAxeVR commented 7 months ago

Here's the original issue I opened on DXC issues:

https://github.com/microsoft/DirectXShaderCompiler/issues/4179

It seems like it's never going to get fixed (too niche a problem?), so slang -> spv direct compilation (with your help) seems like it may be my only solution.

BattleAxeVR commented 7 months ago

To explain how this relates to the anyhit issue, here's what it looks like when I switch RTX ON:

image

As you see, the alpha channel is sampled stochastically, so it becomes noisy, and current denoisers do not handle denoising inherently noisy transparency channels. But these textures aren't noisy, they should be perfectly smooth alpha values. Any time you introduce noise into the system, it makes the image quality worse.

But that's besides the point, if I switch to sampling the vulkan video descriptors directly inside the anyhit shader, it's not only noisy (because the payload can't be used to accumulate the contribution to the radiance term), but ALSO red. So I'm forced to add an extra video decoding pass to make an extra GPU copy of the video textures, then sample those.

Slang could be a good option for just using in the anyhit shader, if I can manage to make it sample the video noise-free. I could try switching in a slang version of my video copy texture shader, that's enough to validate whether the byte code is correct. I'll try that, thanks Nate.

BattleAxeVR commented 7 months ago

Actually, Nate, can you tell me what the equivalent of this HLSL would be in Slang?

Texture2D video_image : register(t0); SamplerState video_sampler : register(s0);

float4 main(float2 input_uv : TEXCOORD0) : SV_TARGET { float2 uv = input_uv; float4 colour = video_image.Sample(video_sampler, uv); return colour; }

And what the command line would be to compile it to an SPV file? Then I can just load that file off disc and try it immediately

BattleAxeVR commented 7 months ago

Not sure why I made an extra copy of the uv coordinates, I think it was to test the UV values themselves were correct by outputting them to the colour value directly (to debug), but anyway.

natevm commented 7 months ago

Do you observe the same bug with slangc as you do with dxc? Note that you can download latest slangc from the releases section of the slang compiler github repo.

Note, we’re still working on adding multiple entry point support in slang->SPIRV. So you’ll want to try reproducing in a minimal sample with only one entry point.

BattleAxeVR commented 7 months ago

k, will try that

natevm commented 7 months ago

Slang is an extension of HLSL, so in theory, the code you have now should compile in slang without modification. In practice, slang generics are a little different from HLSL templates, and slang is a bit more strict about type safety.

BattleAxeVR commented 7 months ago

Unfortunately, it's the same problem as DXC has. To get it to compile and run, I had to add the vk::bindings to the HLSL source, which I didn't need in any other context before this (it was auto-generated/handled by DXC I believe)

image

[[vk::binding(0, 0)]] Texture2D video_image : register(t0);

[[vk::binding(0, 1)]] SamplerState video_sampler : register(s0);

float4 main(float2 input_uv : TEXCOORD0) : SV_TARGET { float4 colour = video_image.Sample(video_sampler, input_uv); return colour; }

// slangc video_ps.hlsl -profile sm_5_0 -stage fragment -entry main -target spirv -o ......\bin\common_data\shaders\Debug\video_ps.spv

// slangc video_ps.hlsl -profile sm_5_0 -stage fragment -entry main -target spirv -o ......\bin\common_data\shaders\Release\video_ps.spv

BattleAxeVR commented 7 months ago

The binding for the texture2D and the sampler should have the same index for immutable samplers, I believe, not sure if I put the correct vk::binding values in there

BattleAxeVR commented 7 months ago

I also tried sm_6_3 instead of 5 and no luck there either

slangc.exe seems to be converting to GLSL 450 first, internally, as an intermediary step.

image

natevm commented 7 months ago

Slang has two SPIRV backends. One goes through glslangvalidator, and then they have a direct-to-SPIRV path which is new, and somewhat experimental, but also more under our control when it comes to bug fixes.

To use the direct to SPIRV backend, the flag you’ll want is -emit-spirv-directly.

natevm commented 7 months ago

I’m not sure if this is a compiler error yet or a driver bug. I’m also unfamiliar with this extension. Do you see anything incorrect with the SPIRV emitted by slang when comparing to the expected SPIRV you get through GLSL?

BattleAxeVR commented 7 months ago

I ran my working GLSL pixel/fragment shader through a GLSL -> HLSL converter, which crashes when I compile it with glslLangValidator but works the same (same bug) when I compile it with DXC.

Texture2D video_texture : register(t0); SamplerState _video_texture_sampler : register(s0);

static float2 outUV; static float4 fragColor;

struct SPIRV_Cross_Input { float2 outUV : TEXCOORD0; };

struct SPIRV_Cross_Output { float4 fragColor : SV_Target0; };

void frag_main() { float4 colour = video_texture.Sample(_video_texture_sampler, outUV); fragColor = colour; }

SPIRV_Cross_Output main(SPIRV_Cross_Input stage_input) { outUV = stage_input.outUV; frag_main(); SPIRV_Cross_Output stage_output; stage_output.fragColor = fragColor; return stage_output; }

I can't for the life of me figure out what's wrong here. It must be something related to the use of the immutable sampler that's used for the ycbcr conversion step. This is normally something that must be pushed on the same binding index and write descriptor index (per the descriptions and usage in this repo).

My best guess is that it's defaulting to a default non-converting sampler instead of the ycbcr conversion sampler which is added via pnext.

BattleAxeVR commented 7 months ago

In my normal HLSL code, I reuse the same explicitly created samplers. When I try that here, it's the same issue, it either crashes, runs but doesn't show any texture value at all, or still shows just the red value (presumably the Y value in the planar format, ignoring the UV values in the other planes)

image

image

This is exactly how I do it for all my other HLSL shaders with explicit samplers. I believe there is a "default" sampler somewhere internally, that, if you don't specify anything, is what's used. And in this case, results in the red texture. It's just not "understanding" or seeing the ycbcr conversion function. So it must be something in the bindings

BattleAxeVR commented 7 months ago

I definitely reproduced it 100%, even with GLSL: if you use a separately / explicitly bound sampler, even one with the ycbcr converter added in to the pnext, it fails to activate the conversion and simply samples the first plane (resulting in a red image).

image

This isn't a big surprise, as it states this fact that one needs to use an immutable sampler for ycbcr conversion for this to work. The problem is, I can't see what difference there is between HLSL and GLSL in terms of this feature. I must be doing something wrong, code-wise.

@Themaister I saw your blog post on the subject, it uses GLSL too though. If you have any ideas I'd really appreciate it

BattleAxeVR commented 7 months ago

I noticed ffplay also has a gpu-based vulkan video decoding sample now (without a round trip to the CPU / main memory), so I presume it uses the same ycbcr immutable sampler. I presume it uses glsl + glsllangvalidator pipeline, but I will attempt to modify the ffplay source code to consume hlsl -> dxc -> spv, hlsl -> glslangvalidator -> spv, or hlsl -> slang -> spv compiled binaries.

I'll report back.

zlatinski commented 6 months ago

Hi BattleAxeVR,

I have added (WIP) filter infrastructure that supports YUV to RGB conversion of the frames. Do you think this would work to address your HLSL problem? The filter can now convert the decoder's output directly to RGBA, so the presentation will work with regular RGBA textures.

Ideally, I would like this HLSL issue addressed because converting / copying to another texture would require additional memory bandwidth. The sample conversion fetches directly from the YUV output by the decoder. Alternatively, we can change the FS to fetch and convert using the filter conversion. This would also be beneficial for implementations not supporting YCbCr sampler conversion.

BattleAxeVR commented 6 months ago

RGBA output from the framework would certainly be a good improvement for the rest of my code so I can sample the textures directly in my path tracing shaders without first making an extra copy/conversion manually, I guess I can live with the doubling of the frame texture memory requirement (420 -> 444 / RGB).

The ideal situation here is simply HLSL being able to use the conversion. I'm not even sure it is DXC's fault, as slangd has the same issue, and, I believe, so does glslLangvalidator when I feed it the same HLSL FS that uses an implicit immutable conversion sampler on the same binding index (which is per the spec). There are two ways apparently to use immutable samplers, I tried both (bound per texture and separately).

BattleAxeVR commented 6 months ago

But if your solution (if I understand correctly) is simply to add an extra fragment shader in the framework here, that achieves the same thing as my own in-engine solution, so I would say, not much benefit or point in doing that as it's not a real fix. I need every ounce of perf + bandwidth possible and making extra copies of multiple high res, high framerate, high dynamic range videos seems like an absurd waste of GPU VRAM bandwidth, as you say. The ideal is 0 copies.

BattleAxeVR commented 5 months ago

At @natevm 's suggestion, I made a 100% repro using Slangc to compile HLSL video decoding shaders instead of GLSL ones used here, in my branch:

https://github.com/BattleAxeVR/vk_video_samples

First I modded the shader module loading code to be able to load pre-compiled SPV files off disc, initially using saved versions of the GLSL versions. Which works 100% compiled offline using glslLangValidator.exe. Then I adapted the vertex shader to HLSL, and compiled it with both DXC and slangc (both latest versions), which also worked 100% fine as a drop-in replacement, with the GLSL fragment shader still used, as before.

Finally I adapted the GLSL fragment shader to HLSL, and...same thing. Only see the red channel (Y, presumably, of YUV planar 420. So it's not doing the conversion, and no matter what vulkan bindings for the sampler I picked in HLSL, same thing). I read everything I possibly could on google on the topic of immutable samplers, and anything about YUV -> RGB conversion in HLSL, and nothing. It appears to me that HLSL may simply be unable to bind immutable samplers whatsoever. Because of how they're bound (or not) to the layout, or something like that. Anyway it's beyond my understanding of how things work under the hood, and I'm at my wit's end now. I wrote a batch file to compile with slangc and saved both GLSL and HLSL files with pre-compiled SPV files that the decoder sample will consume at runtime (either in debug or release). It should show the issue right away if anyone wants to give it a try.

I really have no clue at all how to fix this, but I'm starting to believe it's not actually possible to, and making copies of every video texture may be the only way.

All the relevant shaders and batch files to compile them are here:

https://github.com/BattleAxeVR/vk_video_samples/tree/main/vk_video_decoder

Here's my HLSL pixel shader which repro's the issue in vk video sample:

https://github.com/BattleAxeVR/vk_video_samples/blob/main/vk_video_decoder/video_pixel_shader.frag.hlsl

natevm commented 5 months ago

@BattleAxeVR please open an issue on the official slang repo here: https://github.com/shader-slang/slang/issues

slang developers do not see issues on this repo, but should see it fairly quickly on the other one. It’s probably fine to just copy/paste the above.

natduca commented 5 months ago

We've got a good convo going now over in slang#3343 and internally to figure out if this is a driver issue and/or what we should do here. Thanks for drawing to attention. Stay tuned.

BattleAxeVR commented 5 months ago

The fix works!

I'm OK using this Sampler2D workaround, but it's not really HLSL anymore, is it? It's now slang code. At least, DXC can't compile Sampler2D, doesn't understand it.

But, if I can use slangc on my current raygen HLSL shaders, using an ifdef to compile it conditionally so DXC can still compile it, that's fine for now.

I need to figure out if there's a plugin or a way to integrate slangc into Visual Studio such that I can compile each file using the IDE using Ctrl-F7 or Ctrl-B, and when there are (inevitable) compile errors I can click on them in the output window and the IDE takes me to the offending line directly. I absolutely need this workflow to work to be productive.

But it's ok if, to get the video textures show up properly, I need a post-build step to recompile the SPVs using slangc. That would be fine. But ideally I'd like some VS direct integration so slangc can be first class, as DXC is for SM > 6 .hlsl files

natevm commented 5 months ago

Wrt slangc and VS, you might submit a feature request.

A lot of focus iiuc has gone into the vscode slang extension for integration with intellisense, and I find that that works fairly well.

Personally I have a cmake custom command which compiles my slang shaders, and if errors appear in the terminal, VS Code is pretty good at detecting and highlighting the corresponding lines of text in the editor.

But Falcor users are usually VS users, so perhaps there’s a solution there somewhere…

BattleAxeVR commented 5 months ago

Q) When I compile some shader with slangc, is there any automatically defined "USING_SLANG_AS_COMPILER" or some such preprocessor defined?

If so, I could simply use an ifdef inside my HLSL files to use Sampler2D when compiling offline / outside Visual Studio as a post-build step, and SamplerState / Texture2D when compiling with DXC, inside VS. That would work fine as a workflow and may help even expose other bugs (or potential for fixes), either in DXC or Slang.

Another example, as we discussed, is the anyhit shader fix in slangc, which I intend on implementing as soon as I validate this video sampling fix in my engine (tonight).

It's not in my 2024 Bingo to unlearn/discard my 20+ years of Visual Studio muscle memory to switch to VS Code for my serious C++ work, even though I've used it once or twice I'm far from comfortable enough in it yet. Maybe for my next project.

BattleAxeVR commented 5 months ago

I see slangc has a -D preprocessor macro feature, that should do it.

natevm commented 5 months ago

There is a built-in __SLANG_COMPILER__ definition that you can use too.

BattleAxeVR commented 5 months ago

Thanks! That saves me a manual -DUSE_SLANG=1 and few LOC in my shader headers for conditional compilation.

BattleAxeVR commented 5 months ago

Got a fix in DXC / HLSL now.

Thanks again for all your patience and help.

Closing...