google / shaderc

A collection of tools, libraries, and tests for Vulkan shader compilation.
Other
1.85k stars 362 forks source link

glslc not setting VULKAN = 120, and not compiling to half #1177

Open alecazam opened 3 years ago

alecazam commented 3 years ago

How do you

  1. Get glslc to set VULKAN define to anything other than 100. I'm requesting 1.2?
  2. Get float16_t and f16vec2/3/4 and f16mat2/3/4 to be honored as types?
  3. Get functions like clamp(), sign(), abs() to splice in the half version so I don't get a type mismatch error.

Here's the command line to compile. I don't see any other args related to half for the command line. Is no one using half with Vulkan GLSL shaders? I found the following in-depth talk about getting HLSL to generate half, and zero literature on doing the same with GLSL.

And I don't want to use "mediump float" since that often gets mapped to float, and so I need something more explicit. The spirv flags like "--convert-relaxed-to-half " just end up adding a ton of conversions to half with little in the way of perf benefits. If Vulkan really is a mobile API, then half needs to be better supported on iOS and Android and even some consoles.

https://therealmjp.github.io/posts/shader-fp16/

glslc --target-env=vulkan1.2 -fentry-point=main -fauto-bind-uniforms -fauto-map-locations -std=310es testShader.frag -o testShader.spv

#version 310 es

//#define VULKAN 120

//#if VULKAN >= 120
// blarg120
//#elseif VULKAN > 110
// blarg110
//#else
// blarg100
//#endif

#if VULKAN >= 120
#extension VK_KHR_shader_float16_int8 : enable
#else
#extension GL_AMD_gpu_shader_half_float: enable
#extension GL_AMD_gpu_shader_half_float_fetch: enable
#endif

#if defined(GL_ES)
precision highp float;
precision highp int;
#endif

// Types defined to closer match Metal and force 16bit ALU
#define half float16_t

half saturate(half x)
{
    return clamp(x, half(0.0), half(1.0));
}

layout(location = 0) out vec4 fragColor;

void main()
{
    fragColor = vec4(saturate(half(2)));    
}

Output: testShader.frag:30: error: 'clamp' : no matching overloaded function found testShader.frag:30: error: 'return' : type does not match, or is not convertible to, the function's return type

alecazam commented 3 years ago

If I force the #if VULKAN >= 120 with #if 1, I get:

testShader.frag:14: warning: '#extension' : extension not supported: VK_KHR_shader_float16_int8

alecazam commented 3 years ago

This at least sets the VULKAN define, but then should I really be compiling with glslc or glslangValidator? Seems that these compilers only support "mediump float" and not "float16_t" and related types. Is "float16_t" only useable in constant buffers? The problem is casts fail with "mediump vec2" and succeed with an actual type like "half2(texture( tex, uv ))" or I was hoping with "f16vec2(texture(tex, uv))". Seems that the recommeded course of action so far is to convert all code to GLSL to HLSL using minFloat16 and corresponding types.

glslangValidator --target-env vulkan1.1 -V110 -e main testShader.frag

alecazam commented 3 years ago

Here's another version of that shader

#version 310 es

// Don't use VULKAN since glslc doesn't set it correctly

#extension GL_AMD_gpu_shader_half_float : enable
#extension GL_AMD_gpu_shader_half_float_fetch : enable
#extension GL_EXT_shader_explicit_arithmetic_types : require

#if defined(GL_ES)
precision highp float;
precision highp int;
#endif

// Types defined to closer match Metal and force 16bit ALU
//#define half float16_t

// These seem to be the only thing that can compile to anything
// but since GLSL doesn't have HLSL's typedef this preprocessor
// define doesn't correctly work for casting.
#define half mediump float
#define half4 mediump vec4

half saturate(half x)
{
    return clamp(x, 0.0, 1.0);
}

half4 saturate(half4 x)
{
    return clamp(x, 0.0, 1.0);
}

layout(location = 0) in half4 color;
layout(location = 1) out half4 fragColor;

void main()
{
    half4 ch = color;

    half4 c = saturate(ch);

    fragColor = c;  
}

After running it through all the myriad of steps to spv, half spv, reduced spv, and then spv-cross to metal. This is the resulting shader. Not exactly half throughout or what I expected for a mobile GPU. This is a performance degradation and not really worth the work of marking fields as mediump. Here mediump is applied to inputs, outputs, the functions. Yet only the internals of the clamp convert "Relaxed Precision" to half/half4.

#pragma clang diagnostic ignored "-Wmissing-prototypes"

#include <metal_stdlib>
#include <simd/simd.h>

using namespace metal;

struct fsmain_out
{
    float4 fragColor [[color(1)]];
};

struct fsmain_in
{
    float4 color [[user(locn0)]];
};

static inline __attribute__((always_inline))
float4 saturate0(thread const float4& x)
{
    return float4(clamp(half4(x), half4(half(0.0)), half4(half(1.0))));
}

fragment fsmain_out fsmain(fsmain_in in [[stage_in]])
{
    fsmain_out out = {};
    float4 ch = in.color;
    float4 param = ch;
    float4 c = saturate0(param);
    out.fragColor = float4(half4(half4(c)));
    return out;
}
alecazam commented 3 years ago

Here's the script to recreate the tests above. This is using glslValidator, but the code for glslc is similar.

#!/bin/zsh

# like testShader.frag
shaderName=testShader.frag
entryPoint=main

# convert to spirv
glslangValidator --target-env vulkan1.1 -V110 -e $entryPoint $shaderName -o $shaderName.s0.spv

# first run optimization stages
spirv-opt --eliminate-dead-branches --merge-return --inline-entry-points-exhaustive --eliminate-dead-functions --scalar-replacement --convert-local-access-chains --eliminate-local-single-block --eliminate-local-single-store --simplify-instructions --eliminate-dead-code-aggressive --vector-dce --eliminate-dead-inserts --eliminate-dead-code-aggressive --eliminate-dead-branches --merge-blocks --eliminate-local-multi-store --if-conversion --simplify-instructions --eliminate-dead-inserts --redundancy-elimination --eliminate-dead-code-aggressive --cfg-cleanup -o $shaderName.s1.spv $shaderName.s0.spv

# convert relaxed precision to half
spirv-opt --convert-relaxed-to-half -o $shaderName.s2.spv $shaderName.s1.spv

# cleanup unneeded conversions
spirv-opt --simplify-instructions --redundancy-elimination --eliminate-dead-code-aggressive -o $shaderName.s3.spv $shaderName.s2.spv

# now convert optimized spv back to metal code, so can view the sources and see if half is used
spirv-cross $shaderName.s3.spv --msl --msl-version 10100 --msl-ios --msl-framebuffer-fetch --msl-decoration-binding --rename-entry-point $entryPoint fsmain frag --output $shaderName.metal
alecazam commented 3 years ago

Running the same example with HLSL gens almost a perfect half shader. Inputs/outputs are half4, and the saturate was converted to clamp which is bizarre. saturate used to be a modifier on instructions, so conversion to clamp isn't ideal. Let's hope the shader compilation reduces that.

shaderName=testShader.frag.hlsl
entryPoint=TestShaderFS

glslangValidator --target-env vulkan1.1 -V110 -D --hlsl-enable-16bit-types -e $entryPoint $shaderName -o $shaderName.s0.spv
...
#define half min16float
#define half4 min16float4

struct Input 
{
    half4 color : COLOR0;
};

struct Output 
{
    half4 fragColor : COLOR0;
};

void TestShaderFS(Input input, out Output output)
{
    half4 ch = input.color;

    half4 c = half4(saturate(float4(ch)));

    output.fragColor = half4(c);    
}
#include <metal_stdlib>
#include <simd/simd.h>
using namespace metal;
struct fsmain_out
{
    half4 output_fragColor [[color(0)]];
};
struct fsmain_in
{
    half4 input_color [[user(locn0)]];
};
fragment fsmain_out fsmain(fsmain_in in [[stage_in]])
{
    fsmain_out out = {};
    half4 _61 = clamp(in.input_color, half4(float4(0.0)), half4(float4(1.0)));
    out.output_fragColor = _61;
    return out;
}
alecazam commented 3 years ago

Still an issue even with HLSL. This line fails to compile:

half4 c = saturate(ch);

Where this compiles, but that implies the half version of saturate isn't being applied, so all this casting is then required.

half4 c = half4(saturate(float4(ch)));

void TestShaderFS(Input input, out Output output)
{
    half4 ch = input.color;

    // the following compiles, but implies saturate isn't the half versoin
#if 1
    half4 c = half4(saturate(float4(ch)));
#else
    half4 c = saturate(ch);
#endif

    output.fragColor = c;   
}
greg-lunarg commented 3 years ago

Sorry for the delay. Should be getting to this shortly.

greg-lunarg commented 3 years ago

The following shader seems to compile correctly with glslangValidator.exe --target-env vulkan1.2 -o foo.frag.spv foo.frag:

// #version 310 es
#version 450 core

#extension GL_EXT_shader_explicit_arithmetic_types_float16: enable
// #extension GL_AMD_gpu_shader_half_float: enable

float16_t saturate(float16_t x)
{
        return clamp(x, float16_t(0.0), float16_t(1.0));
}

layout(location = 0) out vec4 fragColor;

void main()
{
        fragColor = vec4(saturate(float16_t(2.0)));
}

It seemed this explicit style is what you originally wanted. Is this a good point to move ahead with? Or did you want try to move ahead with one of the other shaders and workflows above?

A few notes:

alecazam commented 3 years ago

Thanks that gives me another avenue. Although I compile to vulkan 1.0 due to Android. I'll try replacing our #version with 450 core. I'm just a little afraid of trying to then feed the spirv-cross GLSL output from that on mobile (iOS and Android). Our minspec is all GLES 3 era, but that had half support.

glslc isn't setting the VULKAN constant properly (it's always 100), and doesn't expose any of the half flags that glslangValidator has. But I can probably switch out our shader gen to use glslangValidator. I used glslangValidator for all the test cases above.

You don't have any half inputs/outputs in your shader. You might try medium vec4 fragColor. That was the other issue is that none of the shader inputs/outputs were using half, so casts ensued. I'll try your test case in a little while, but that's some quick feedback. Replace my saturate() with clamp() and just see if the half version of that is spliced.

I think the AMD extensions were the vulkan 1.0 era extension. Now 1.2 formalized VK_KHR_shader_float16_int8, but glslc and glslangValidator failed to find that extension. It looks like you had some luck with GL_EXT_shader_explicit_arithmetic_types_float16 which I wasn't aware of.

greg-lunarg commented 3 years ago

Although I compile to vulkan 1.0 due to Android.

That also works.

I'll try replacing our #version with 450 core.

I am planning to fix so that #version 310 es works.

glslc ... doesn't expose any of the half flags that glslangValidator has.

Which ones are those?

You might try medium vec4 fragColor

Presuming you mean mediump. Are you aware that the precision qualifiers such as mediump do not change the format of the variable, only the precision that is used to compute it. So mediump float means it is still 32 bit IEEE format, but the GPU can use just 16 bits of precision to compute it.

... VK_KHR_shader_float16_int8, but glslc and glslangValidator failed to find that extension

This is a Vulkan extension, not a GLSL extension.

I will also look into the problem you are seeing with the VULKAN pre-defined.

alecazam commented 3 years ago

I am planning to fix so that #version 310 es works.

Nice, I think the mobile 200/310 es platforms are the ones that benefit from fp16 the most. Some of the desktop 450 parts have issues with fp16 (see below).

Which ones are those?

My scripts are above, but it was the hlsl 16-bit flag glslangValidator --target-env vulkan1.1 -V110 -D --hlsl-enable-16bit-types -e $entryPoint $shaderName -o $shaderName.s0.spv

Presuming you mean mediump. Are you aware that the precision qualifiers such as mediump do not change the format of the variable, only the precision that is used to compute it.

I was under the impression that the compiler could use fp16 or fp32. For example, Nvidia hobbled the fp16 units to 1/32 of the perf of fp32 on the GTX 1080 after people started buying the cheaper cards for ML. And many of the consoles ignore fp16.

In our case, I'd probably compile the shaders twice, once with fp16 and once with fp32 to address poor half performance or Android where fp16 support might be spotty. Also since GLSL doesn't have a typedef, we need to use our #define half float16_t which is actually a type as opposed to #define half mediump float which isn't. Also how would you declare a half sampler if "mediump float" is just a recommendaton.

greg-lunarg commented 3 years ago

I was under the impression that the compiler could use fp16 or fp32.

Yes, but only where size doesn't matter, for instance, for function scope variables. But when it is part of the interface, such as members of buffers or input/output, they remain 32 bit. This is the same for min16float.

Last I looked, Metal did not support such a relaxed precision feature, but it did support true 16bit floats. That is why I wrote the pass in spirv-opt that converts relaxed precision 32bit float operations to true 16bit float operations. That pass is more useful when a whole computation tree is transformed to 16bit as it minimizes convert ops.

alecazam commented 3 years ago

Metal like HLSL adopts the half vs. float. I know HLSL also has min16float. My understanding is that Metal has full support Vulkan 1.2 VK_KHR_shader_float16_int8, but when I tried to include that in my shader the extension failed to compile. It's a bit confusing, since Vulkan tacks on all of this extension stuff onto existing shader languages (HLSL/GLSL) instead of defining its own.

So if we're targeting MSL 1.0 out of spriv-opt, then can we compile to 450 core or should we stick with 310 es? And how about Vulkan 1.0 or Vulkan 1.2? I guess I just try combos until the compiler and transpiler break.

alecazam commented 3 years ago

I'm finally getting back to looking at this. Just wanted to say that I'm actually getting really great half shader generation with overriding 310es to 450core and replacing mediump float/vec2/3/4 with float16_t/vec2/3/4. Uniforms are still all float, but I can cast around those. I have these extensions set, but maybe I need another one?

Also seeing medium float sampler2D convert to texture instead of texture. Maybe I can declare those has half sampler2D. Nope, that didn't work.

// 16bit type support
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : enable

// to store data in constant buffers as half
#extension GL_EXT_shader_16bit_storage : enable

// allow use of half in shaders
//#if VULKAN >= 120
// this deprecates GL_AMD_gpu_shader_half_float, 1.2 extension
//#extension VK_KHR_shader_float16_int8 : enable
//#else
//#extension GL_AMD_gpu_shader_half_float: enable
//#extension GL_AMD_gpu_shader_half_float_fetch: enable
//#endif
alecazam commented 3 years ago
  1. So I tried using half/2/3/4 in my uniform buffers, and they do convert to half in the MSL.

  2. There seems to be no way to get "mediump sampler2D name" to generate a texture2d\<half> in Metal. It's always generating texture2d\<float> and then a ton of casts ensue. I wrapped all these calls, but would prefer that this returns half4 from the shaders that specify mediump. I also can't set the "--relax-float-ops" since that goes too far at forcing all float32 to RelaxedPrecision. I can only use "--convert-relaxed-to-half".

alecazam commented 3 years ago

Also I see this


in highp vec4 a_position;
in highp vec2 a_uv0;

out highp vec2 v_uv0;

void main()
{
    // Position
    gl_Position = u_viewProj * a_position;
    gl_Position.xy += 2.0 * u_jitter * gl_Position.w;  <- 

convert to this, which is not correct codegen with --convert-relaxed-to-half. Seems like the extension identifies that 2.0 fits in half, and then tries to do all the math in it. The u_jitter uniforms is mediump vec2. But the uniform isn't exported with mediump. It just seems to throw off the codegen and precision.

float2 _81 = float2(half4(out.gl_Position).xy + ((half2(_24.u_jitter) * half(2.0)) * half(out.gl_Position.w)));

I'm also having to chase other instances where a mat3 * half3 caused the code to half convert every element of the mat3 instead of promoting the half3 to a float3.

When I convert to this:

gl_Position.xy += vec2(2.0 * u_jitter) * gl_Position.w;

Then I get the expected codegen in MSL which honors the precision of gl_Position.xy and gl_Position.w. This is purely using mediump, not even half is involved here.

 float2 _52 = out.gl_Position.xy + (float2(half2(half2(_24.u_jitter) * half(2.0))) * out.gl_Position.w);
alecazam commented 3 years ago

To get valid codegen, I had to remove all mediump and lowp. Then only use half and float types. The half codegen is fine. It's the mediump codegen that does all the math in half, and loses the precision. But that should promote to a higher precision when that is combined.

greg-lunarg commented 3 years ago

If you can give me a compilable shader and command lines to compile it, I can take a look.

alecazam commented 3 years ago

I override this to 450 core since you said 310 es doesn't handle half. This code is compiled with flag "--convert-relaxed-to-half". The codegen failure has no half usage, but I added the texture sample case below that's not converting to half. It's just the Relaxed Precision use on u_jitter and the convert flag. I'm having to cast all float constants with half(2.0) vs. using 2.0h, which makes math heavy shaders a bit slower to convert.

In the shader code, I removed all mediump off uniforms except for the samplers. Even though "mediump sampler2D" don't convert to half right now, it's the only way I know to specify that I want half usage there.

I also added the mat3 case below. This generates ineffiicent code, but I went through and replaced all cases with an explicit transformNormal function that converts the normal to float and back.

I'm looking at the Metal transpile from sprirv-cross to verify codegen. That's the only way I know to review spirv, since spv files are binary. I tried using spriv-dis on those, but the disassembly really isn't readable like dx assembly.

#version 310 es

// 16bit type support
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : enable

// to store data in constant buffers as half
#extension GL_EXT_shader_16bit_storage : enable

// glslc only sets VULKAN to 100, glslangValidator set this correctly
#if VULKAN >= 120
// this deprecates GL_AMD_gpu_shader_half_float, 1.2 extension
#extension VK_KHR_shader_float16_int8 : enable
#else
// this was original fp16 extension
#extension GL_AMD_gpu_shader_half_float: enable
#extension GL_AMD_gpu_shader_half_float_fetch: enable
#endif

#endif

layout( binding = 2, set = DescriptorSet ) restrict readonly uniform
{
  mat4 u_viewProj;
  mediump vec2 u_jitter;
  mat3 u_modelIT;
};

// this doesn't convert to half, and all samples are float
mediump sampler2D testSampler;

in vec4 a_position;
in vec2 a_uv0;
in fp16vec3 a_normal;

out vec2 v_uv0;
out fp16vec4 v_color;
out fp16vec3 v_normal;

void main()
{
    gl_Position = u_viewProj * a_position;
        v_uv0 = a_uv0;

    // 1. This is not preserving precision of gl_Position.xy in the resulting math,  It converts xy and w to half, 
        // and does the add in lower precision replacing the high-precision gl_Position.xy value.
    gl_Position.xy += 2.0 * u_jitter * gl_Position.w;

        // this generates correct precision code
        // gl_Position.xy += vec2(2.0 * u_jitter) * gl_Position.w;

        // 2. this inefficiently converts the entire mat3 to half to perform the math
        v_normal = ((mat3)u_modelIT) * a_normal;

        // 3. this doesn't compile to half either, have to use textureH wrapper to cast to fp16vec4
        v_color = fp16vec4(texture(testSampler, a_uv0));    
}
alecazam commented 3 years ago

On another sample, using spirv-cross to generate MSL from SPV. I don't see this return anything but texture\<float> even though that texture has RelaxedPrecision marked in the spv file.

Following deep into type_to_glsl is just a bunch of raw id's and undebuggable obscured lookups, so I can't help with why this has id = 6. The half type seems to be the 12th enum, and float is the 13th.

line: 12328 spriv_msl.cpp
string CompilerMSL::image_type_glsl(const SPIRType &type, uint32_t id)
{
        // Append the pixel type
    img_type_name += "<";
    img_type_name += type_to_glsl(get<SPIRType>(img_type.type));
alecazam commented 3 years ago

Any updates or suggestions? Given my time constraints, I'm just going to have to disable half on Android, since I can't ship 450 core in a spv file there. Those aren't 450 core parts. I can at least try to use the existing half stuff on iOS, macOS, and some consoles.

I see a lot of casts and have to add many constant casts. I know on iOS, casting between fp16 and fp32 is basically free. But on PowerVR and likely the Android platforms, my understanding it that it's not. I don't think the compiler pre-shaders know to cast once when data is set on the uniform block or when sampling from the texture followed by cast. Following this out to DX assembly or something the hw actually runs would be more convincing, and something that runs much faster using fp16.

I'm going to be looking at perf, but on the A12 the fp16 work didn't make a measurable difference so far. On all the arm64 bit parts fp16 units exist and may or may not coissue with fp32. So the effort is worth while. iOS GPU capture breaks out fp16 ops in the VS/FS/CS, and before using the flag our count was 0. This is a good discussion of the Apple vs. PowerVR GPU architecture in this link.

https://www.realworldtech.com/apple-custom-gpu/

greg-lunarg commented 3 years ago

Sorry, been busy, but can spend a little time on this now.

I will start by getting GL_EXT_shader_explicit_arithmetic_types_float16 to work for 310 es. From there we can figure out what is next most important.

greg-lunarg commented 3 years ago

Should have something by the end of the day.

greg-lunarg commented 3 years ago

Please grab the branch for https://github.com/KhronosGroup/glslang/pull/2612. This should allow you to use GL_EXT_shader_explicit_arithmetic types for 310 es.

Please let me know what else you need.

alecazam commented 3 years ago

Awesome. That will let me use the code on Android. There were the 3 issues in the shader that you requested above. I'd say getting texture2d out of mediump sampler in GLSL would help. I already have uniforms and attributes and instructions using half, but this is the last missing piece.

Having fp32 precision drop when mixed with a single mediump value is concerning. The promotion should be to higher not lower precision. See example with mat3 * half3 where the mat has each element converted to half, or the gl_Position.xy getting converted to fp16 because I did gl_Position.xy += jitter (a mediump uniform). Those are code major codegen issues. If I have to manually validate codegen, then that's a problem for using "Relaxed Precision" anywhere.

I'm also seeing a lot of repeating const/cast initialization. Like if I have a const float3 = float3(half(1.0), half(2.0), half(3.0) then that's injected everywhere instead of being setup once. Is there a way to define half constants outside of calls or a better way? I forget if it's static const or static_const or whatever GLSL uses. The compile could still recognize these only need to be done once, but the shader code is a bit harder to read.

alecazam commented 3 years ago

For some reason, I've got code failing around here only when I've got the half flag in use. Is this disassembly really supposed to be human-parseable output?

Given that there's no transpile to code for spv, when the compiler fails and dumps out error on line 689 in spirv dissassembly instead of telling me the original source line then I suddenly have to spelunk through this. Not even the variable names are spliced in from spirv-dis. The Metal transpiles are fine and compile correctly and are completely readable and verifiable. The spv is not.

It looks like it's taking a vec3 and trying to extract a half3 out of it. The whole shader is a vertex shader is all fp32. Only mediump setting is on int precision.

This is with macOS Intel. spirv-opt --version SPIRV-Tools v2020.3 v2020.3

The workaround for now is to turn the struct into

struct  Foo
{
   highp vec3 pos0, pos1, pos2, pos3.
};

precision mediump int;
precision highp float;

struct Foo
{
    highp vec3 pos[4];
};

Foo Func()
{
    Foo data;
    for (highp uint i = 0; i < 4; ++i)
    {
        data.pos[i] = vec3(0.0); <- failing to compile this, if I comment out it's fine.
    }
    return data;
}

line 614: Result type (OpTypeVector) does not match the type that results from indexing into the composite (OpTypeVector).

       %709 = OpLabel
        %880 = OpLoad %float %873
        %879 = OpLoad %_arr_float_uint_2 %872
        %878 = OpLoad %_arr_v3float_uint_2 %870
        %877 = OpLoad %_arr_v3float_uint_4 %868
        %940 = OpCompositeExtract %v3half %877 0 <- 
       %1008 = OpFConvert %v3float %940
               OpStore %936 %1008
alecazam commented 3 years ago

Also poor codgen on the use of mediump samplers which just convert the value to half regardless of whether it's used as full float in the shader. Workaround is to use highp samplers which don't do this, then cast to half4 after.

mediump sampler2D u_tex;
highp vec4 value = texture(u_tex, uv);

generates:
float4 value = float4(half4(u_tex.sample(u_texSmplr, uv)));  <- casts lose precision of result

And then on uv coordinates, I see transpiled Metal code like this where uv are converted to half even though that's not requested.


vec4 offsets = ...;
vec2 uv = ...;

weight[1].x = textureH(u_texSmplr, vec2(offsets.x, uv.y)).w;

generates:

 weight[1].x = half4(u_tex.sample(u_texSmplr, float2(half2(half(offsets.x), half(uv.y))))).w;
alecazam commented 3 years ago

Hitting this trying to retool from glslc to glslangValidator. Would be nice if glslangValidator would support #include just like glslc. This is the modern day of shaders where source code doesn't all live in one file. WebGL replicated the same problem. back when I worked with it. Have to modify all my source with a #extension now just to get #include to work, since I can't pull that in from a #include.

https://github.com/KhronosGroup/glslang/issues/1691

If the CLI already has a -I directive on glslang, then requiring other flags and extensions seems inconsistent.

alecazam commented 3 years ago

The latest glslc lets me use your change for 310 es. So now I can leave the #version at the tops of the files. The codegen is still questionable when the flag is on. This is Metal code, where I'm trying to specific all float uv, but the relax calls insist on conversion to half2 then back to float2.

         float4 offsets = float4(uv.x, offsets.y, 0.0, offsets.w);  <- did this to prevent conversion, but no longer working

        spvUnsafeArray<half4, 2> weightTaps;
        weightTaps[0].x = half4(u_tex.sample(u_texSmplr, float2(half4(offsets).xy))).w; <- why is compiler doing this ?

These are all over the codegen. mediump sampler doesn't mean that uv are also half. And Metal doesn't support half for uv coordinates anyways. Our code is careful to only use float ops on the uv/w coords. I may want the sampling results as half but never the uv.

float4 diffuse = u_diffuseTex.sample(u_diffuseTexSmplr, float2(half4(in.v_uv).zw));

I'm trying completely disabling use of mediump on samplers, since I already have casts to half4 on all texture ops.

greg-lunarg commented 3 years ago

I am happy to help, but I am having trouble figuring out context.

It would help me if you could give me a full shader and compiler command line which replicates what you are seeing

In general, given GL_EXT_shader_explicit_arithmetic_types, you should not need to use the precision qualifiers and spirv-opt --convert-relaxed-to-half, and probably should not use either of them. convert-relaxed-to-half was designed for when true 16 bit types were not used by a shader and the shader could not be edited. I would suggest not using it at all. Since you are not using this, and Metal does not support the precision qualifiers, you probably should not use them either. You should be able to do everything you need to do the the explicit types.

alecazam commented 3 years ago

Okay, it seems like the "convert-relaxed-to-half" is the source of many of the codegen issues. I was under the impression, that we only got float16_t support when that was enabled, and already had GLSL with significant uses of mediump on uniforms, attributes, and samplers. But in general, that setting seemed to always convert to fp16 instead of honoring the parts that were fp32 in the shader code. Disabling that flag eliminated the erroneous "float2(half4(uv.xy))" on the texture uv lookups.

I'm building with the latest glslc, and it seems to be generating half code and handling 310 es. Will update next week when that releases with your changes.

alecazam commented 3 years ago

The original intent here wasn't to completely rewrite our shaders. We already have a lot of glslc with mediump/lowp usage. Upon inspecting the transpiled Metal and unreadable compiled SpirV disassembly files, no fp16 usage was noted. That's what led me to try "convert-relaxed-to-half". But with that set, the codegen demotes fp32 to fp16 all over the sources, and that's not following the GLSL spec where precision is preserved to the highest bit depth.

So after stripping out mediump/lowp and starting to replace those with half (float16_t), now codgen is more reasonable. It all seems to compile with 310 es, avoiding the warnings of trying to set 450 core. But it's a big undertaking for a lot of shaders. I can see other teams wanting to capitalize on fp16 as well, given that it can run 2x on AMD, ARM, Apple, Nvidia GPUs. The shaders above should convey many of the codegen issues.

greg-lunarg commented 3 years ago

convert-relaxed-to-half is not incredibly sophisticated. It is designed to operate efficiently only on a shader where EVERYTHING is relaxed precision (mediump, lowp). It will convert everything that is marked relaxed to half, even if the cost of the required converts makes performance worse. If everything is relaxed, the converts that are required are minimal and it pays off. We had a user for whom this was useful. In fact, there is another spirv-opt pass (--relax-float-ops) that makes all floating point relaxed. They ran that first.

alecazam commented 3 years ago

Just as a follow up. There are many details to just getting half codegen. This all seems tied up in specifying 1.1 target, using the float16_t shader/storage extension, and having support for that which actually isn’t that prevalent. Adreno Vulkan drivers don’t supply this on my Android devices, but maybe Android 10 is too old. Then have to deal with Nvidia not handling inputOutput16.

This really shouldn’t be this hard and isn’t in MSL. Otherwise all ops are full float, so why even bother marking ”RelaxedPrecision” on variables. Our transpiled MSL used half, but the SPV with 1.0 target had useless OpLoad, OpFConvert, and OpStore commands that broke drivers like Adreno that didn't expose the float16_t shader extension, but didn’t actually do anything or improve performance.

https://github.com/KhronosGroup/SPIRV-Tools/issues/4546

This is basically the setup we have to gen the half shaders with glslsc (shaderc). The other setup doesn't use the extension and remaps all these defines to full float and compiles with --target-env=vullkan1.0.

Compile with glslc --target-env=vulkan1.1


precision mediump int;
precision highp float;

// these are HLSL types, but specific about precision
#define float2 vec2
#define float3 vec3
#define float4 vec4

#define float2x2 mat2
#define float3x3 mat3
#define float4x4 mat4

#if USE_HALF

// compile with --target-env=1.0
// tied to VK_KHR_shader_float16_int8
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : require

#define half float16_t

#define half2 f16vec2
#define half3 f16vec3
#define half4 f16vec4

#define half2x2 f16mat2
#define half3x3 f16mat3
#define half4x4 f16mat4

// for nvidia drivers
#if USE_HALF_INPUT_OUTPUT
    #define half_io  half
    #define half2_io half2
    #define half3_iot half3
    #define half4_io half4
#else
    #define half_io float
    #define half2_io float2
    #define half3_io float3
    #define half4_io float4
#endif

#else

// compile with --target-env=1.0
// these should use mediump but can't since used in casts and ctors
#define half float

#define half2 float2
#define half3 float3
#define half4 float4

#define half2x2 float2x2
#define half3x3 float3x3
#define half4x4 float4x4

#define half_io  float
#define half2_io float2
#define half3_io float3
#define half4_io float4

#endif
alecazam commented 3 years ago

Now passing shaders using that shader_float16_int8 compiled ops to Adreno Vulkan drivers, produces the following failure in the driver. These run fine on Mali devices. These drivers support shader_float16_int8, but don't publish the 16bit_storage extension. That shouldn't cause this kind of problem. I can't even see any specification of rounding modes in any of the assembly code, and this particular vertex shader doesn't even use half (only some mediump) usage.

AdrenoVK-0: Shader compilation failed for shaderType: 1 AdrenoVK-0: Info log: Assertion failed: false && "Unknown floating point rounding mode"