microsoft / DirectXShaderCompiler

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.
Other
3.03k stars 677 forks source link

[SPIRV] Generated shaders by DXC causes a crash on Android #1460

Closed Cry-Mory closed 6 years ago

Cry-Mory commented 6 years ago

I am having a problem with creating graphics pipeline with DXC generated shaders that sample from textures. I have reduced the shader to a simple shader which samples the texture and output the color. You can find the HLSL shader here TextureShader.hlsl.txt:

I am generating the SPIRV shader with the following commands:

.\dxc.exe -spirv -T ps_5_0 -E TexturePS .\TextureShader.hlsl -Fc TextureShader.h_spv -Fo TextureShaderFrag.spv
.\dxc.exe -spirv -T vs_5_0 -E TextureVS .\TextureShader.hlsl -Fc TextureShader.h_spv -Fo TextureShaderVert.spv

At the same time, glslangValidator generates the SPIRV shader from the same HLSL code which works fine for graphics pipeline creation. You can find both DXC and glslangvalidator generated shader here:

DXC: TextureShaderFrag_DXC.txt GLSLangValidator:TextureShaderFrag_GLSLang.txt

I took a look in the SPIRV codes but did not find any problem with DXC code.

Would you please take a look and see what causes this issue?

antiagainst commented 6 years ago

Hi @Cry-Mory: could you provide more details on the problem you encountered? Any error message? What is the environment (GPU, driver, etc.)? Thanks.

antiagainst commented 6 years ago

By eyeballing, I didn't find anything suspicious. One thing worth mentioning maybe is the Depth value of the OpTypeImage instruction. It causes problem on some NV drivers as reported by #1455. Maybe you are running into the same issue?

Cry-Mory commented 6 years ago

Hi @antiagainst

Thanks for your reply. I am running on ARM Mali-G72 which is supporting Vulkan 1.0.65 standard.

Unfortunately, I don't know which Depth value is used by glslangValidator but DXC's depth value is 2 which means there is no indication that a depth image is used or not; which IMHO it is the right value.

Can you guide me how to test other depth values to see if that is the problem or not? If that's the problem, I can open a bug report for Mali driver.

Cry-Mory commented 6 years ago

I also have prepared a simple sample which I can share if that is helpful.

antiagainst commented 6 years ago

If in our original source code you are doing depth-cmp sampling on a image, you should use 1 as the depth value. Otherwise, use 0. So for the above example, you should use 0.

Disassembling the SPIR-V generated by DXC and manually change the value Depth field, and then reassembling will do the trick. Sorry for the manual steps.

Cry-Mory commented 6 years ago

As you can see in the SPIRV shader that I attached to the issue, DXC is using 2 as the depth value for the sampled image.

%type_2d_image = OpTypeImage %float 2D 2 0 0 1 Unknown

image

antiagainst commented 6 years ago

Yes. I was suggesting to change it to 1 or 0 accordingly to see if it fixes the problem.

Cry-Mory commented 6 years ago

Thanks to your example (https://github.com/KhronosGroup/SPIRV-Tools/blob/master/examples/cpp-interface/main.cpp), I changed the depth to 0, and the shader worked on the Android device with Mali G72.

So, is it possible to change this value to 0 for OpImage?

antiagainst commented 6 years ago

I think we already had the discussion in #1107, and you participated in it. ;-P I also explained more on https://github.com/Microsoft/DirectXShaderCompiler/issues/1455#issuecomment-408229509.

There is a fundamental difference between HLSL and SPIR-V: HLSL has the depth info on sampler types, but we want it on the image type in SPIR-V. So we need some "magic" to migrate information from sampler types to image types, by checking how they interact in the source code.

As per the current implementation, setting the correct depth hint for OpTypeImage is a non-trivial work; it basically requires us to traverse the AST one more time at the beginning to collect info regarding whether an image is used only for depth-comparison sampling or not.

If an image is statically (dynamically it may not) used for both depth-comparison & non-depth-comparsion sampling, then we cannot use 0 or 1 as depth value anyway. The issue persists.

Even if we can make sure an image does not have mixed usage, the aliasing problem makes it very hard to analyze, see https://github.com/Microsoft/DirectXShaderCompiler/issues/1107#issuecomment-377697003.

We discussed this again recently in the Khronos Group. The consensus is still to keep the current Vulkan and SPIR-V spec; the depth value is a hint (SPIR-V spec), and it should be completely ignored (Vulkan spec). Tools and drivers should respect this.

But yeah, I know the pain of stale drivers. So if you are trying to find some workaround, maybe https://github.com/Microsoft/DirectXShaderCompiler/pull/1191 is helpful. (It was reverted by https://github.com/Microsoft/DirectXShaderCompiler/pull/1261 in the master branch. So you can revert #1261 if you want.)

antiagainst commented 6 years ago

Oh, if you don't want to build the compiler yourself, feel free to send out a pull request for reverting #1261 so that you can leverage Appveyor to do the build and download afterwards. :)

Cry-Mory commented 6 years ago

Thanks a lot for your complete reply. Now, I remember our discussion.

I believe you are right and the Mali driver developer should fix this bug in their driver. I will report this to ARM.

That's great! I already have compiled DXC with reverted #1261 but it is very sad and painful that I have to use two different compilers for Android and PC at the moment.

oscarbg commented 6 years ago

I'm joining the discussion without invitation.. :-) @Cry-Mory nice to hear Cryengine is getting Android Vulkan support.. just tested this week new Cryengine 5.5 preview 5 "Vulkan in Editor" support.. not that I want to add more work, but just wanted to ask if you are testing also on Qualcomm Adreno modern devices (A5xx or A6xx) to catch early this kind of "Depth value of the OpTypeImage instruction" issues so you can report to Qualcomm also if bugs present.. I say because would be sad if soon after releasing some early Cryengine Vulkan Android preview these issues were found on Adreno also and could take some months to Qualcomm to fix.. that's all I wanted to say.. ;-)

antiagainst commented 6 years ago

Since this is not a DXC bug, I'll close this issue.

Cry-Mory commented 5 years ago

@antiagainst @ehsannas As we discussed here, mobile drivers seems to have a hard time handling depth literal for OpImageType when it is set to 2. So far, I was always reverting #1261 and Depth literal was always set to 0 for all the cases, and that was working fine with most of the devices we support. But it seems new code changes have moved the part setting the depth literal to somewhere else and that is blocking me to update our DXC compiler to the latest version.

Would it be possible for you to kindly help me with changing the code in a way, it always set depth literal to 0 for all OpImageTypes?

At the moment, I am having a problem with an attachment definition in one of our shaders. The old compiler (with reverting #1261) define the image type as follows:

%type_subpass_image = OpTypeImage %float SubpassData 0 0 0 2 Unknown

while the new one (I reverted #1261 for this one as well) defines it like this:

%type_subpass_image = OpTypeImage %float SubpassData 2 0 0 2 Unknown

ehsannas commented 5 years ago

If you want to modify your local build to always set the depth of OpTypeImage to 0, you could apply the following hack:

Change this: https://github.com/Microsoft/DirectXShaderCompiler/blob/4870297404a37269e24ddce7db3bd94a8110fff8/tools/clang/lib/SPIRV/SpirvType.cpp#L138-L143

to:


ImageType::ImageType(const NumericalType *type, spv::Dim dim, WithDepth,
                     bool arrayed, bool ms, WithSampler sampled,
                     spv::ImageFormat format)
    : SpirvType(TK_Image, getImageName(dim, arrayed)), sampledType(type),
      dimension(dim), imageDepth(ImageType::WithDepth::No), isArrayed(arrayed), isMultiSampled(ms),
      isSampled(sampled), imageFormat(format) {}

This is forcing the constructor of OpTypeImage to always use ImageType::WithDepth::No. Hope this helps.

Cry-Mory commented 5 years ago

@ehsannas Thanks a lot for your reply. The hack helped me and I have our shaders working back again on mobile devices. With this fix, now I can try relaxed precision branch.