o3de / o3de-azslc

Amazon Shader Language (AZSL) Compiler
Other
23 stars 14 forks source link

RFC - unconstrained language evolution #71

Open siliconvoodoo opened 1 year ago

siliconvoodoo commented 1 year ago

Motivation

With the announcement of HLSL 2021 https://devblogs.microsoft.com/directx/announcing-hlsl-2021/ templates, operator overloading and bitfields have been introduced. We observe that it would be a substantial cost and time inertia to follow such impactful language evolutions in AZLSc. The question is: is there a way to remove the logic-heavy part of azslc (semantic analysis) to make it that azsl is transparently hlsl? That way future evolutions of HLSL as a language would naturally become immediately available, ideally simply by a package release of DXC.

Suggestion

One option that seems to me of least effort, would be to cut the edition process in 2 parts, the AZSL part that holds the resources, and the HLSL part that holds the code.

Concept prototype idea 1

If from that input:

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
};

ShaderResourceGroup SRG : slot1
{
    struct CB
    {
        float4 color;
    };

    ConstantBuffer<CB> m_uniforms;
};

We get that output:

struct SRG_CB
{
    float4 color;
};
ConstantBuffer <::SRG_CB> SRG_m_uniforms : register(b0, space0);

Then we can save it to inputs.hlsl and extend it with follow up file:

#include "inputs.hlsl"  // auto generated from azsl

float4 MainPS( float2 uv : TEXCOORD0) :SV_Target0
{
   // edit here
   return SRG_m_uniforms.color;  // your resource names have mutated, refer to inputs.hlsl to identify their flattened names
} 

We note that the resource variables have changed names because of the mutations undergone in the process of SRG-erasure (transpilation from AZSL to HLSL). So it requires the programmers to take consciousness of the mutation scheme, and consult the input.hlsl to know what they have to work with.

Advantages

Inconvenients

Evolution idea 1.1

The problem is that the mutation can be platform specific, and can be azslc version dependent. Also, it can be unpredictable because of name collision avoidance. e.g. SRG::m_uniforms may become SRG_m_uniforms or SRG_m_uniforms1.
Otherwise said, there is no specification guarantee on the rename scheme. To ease that issue, we can imagine an __asm__ block scheme, with what historically was called "clobber" declarations to make the link between host language and DSL.

Example of what it could look like:

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
    ShaderVariantFallback = 128;
};

option bool reflections = false;

ShaderResourceGroup SRG : slot1
{
    struct CB
    {
        float3 sceneBounds;
    };

    ConstantBuffer<CB> uniforms;

    float4 iblAvg;
    float4 ambient;

    Texture2DMS<float4, 8> fresnel;

    enum Composite { Spec, Diff };

    Composite Get(bool forceOff) { return !forceOff && reflections ? Spec : Diff; }
    CB Get() { return uniforms; }
    float4 Get(Composite c) { return c == Spec ? iblAvg : ambient; }
};

struct PSInput
{
    float4 position : SV_Position;
    float4 color    : COLOR0;
};

typealias CB = SRG::CB;   // this location is stable so can be referred to

__hlsl__
@{
   // declarative zone where lookup happens once from the global scope and gets cached into an alias, that becomes available for the HLSL block.
    using Get = SRG::Get;  // alias the overloadset
    using Spec = SRG::Composite::Spec;  // enumerators mutate
    using fresnel = SRG::fresnel;    // variables also mutate

    // from here, code is like a comment for AZSLc
    template< typename UV_t >
    float4 PScoreT(PSInput input, UV_t uv, int si)
    {
        CB cb = Get();
        float4 spec = fresnel.sample[si](uv);  // we lose ability to mutate --no-ms
        if (position.xyz < cb.sceneBounds)  // field names don't mutate AFAIR
            return Get(Get(false));
        else
            return 0;
    }

    // templates can't be entry points in HLSL. declare a concrete version
    float4 PSMainF2(PSInput i, float2 f : TEXCOORD0, int si : SV_SampleIndex) : SV_Target0
    { return PSMainT(i,f, si); }

}@  // we need a "raw string literal"-way of ending the block

Bear with me that the program is nonsensical. But the point is to illustrate what we lose and what we win. We win language involutivity, but we lose perfect integration with the azsl-declared resource. They need to be bridged in some way (somewhat akin to lambda capture), so that the access to the mutated symbols in the HLSL block can bind to their intended symbol.

Advantages

Inconvenients

Concept idea 2

Strongly reduce the invasiveness of AZSL specific syntax constructs. Tending instead toward a decorated HLSL. The compiler would still need to exist to do reflection and resource registers assignation in each platform way. Also would still generate option and rootconstant variable getters. Would still need to accept non-HLSL blocks such as: static samplers with in-situ states declarations, or SRG frequencies and option fallback key. But the names would be expected to be stable since no flattening or scope mutation will happen. Client-site usage (later in code), will remain naturally compatible with the declaration.

As per @santorac proposal:

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
};

[ShaderResourcesGroup(slot1)]
namespace
{
    struct CB
    {
        float4 color;
    };
    ConstantBuffer<CB> uniforms;
}

Using an annotation, AZSLc2 would have to recognize that attribute to register resources instead of the ShaderResourceGroup block of today.

Advantage

Inconvenient

Prototype

I (@siliconvoodoo) am forking the main repository to try this evolution here: https://github.com/SiliconStudio/o3de-azslc-evo

Findings

I see 3 pathways of implementation to the target:

image

Further

We can also decide to delete the ShaderResourceGroupSemantic syntax and integrate it to attributes as well:

[[azsl::ResourceGroupSemantic]]
namespace slot1
{
    static const int slot = 1;
    static const int frequencyId = 128;
};

[[azsl::ResourceGroup(slot1)]]
namespace SRG
{
    struct Data { float4 f; };
    ConstantBuffer<Data> glob : register(b0);
}

We'll note that those attributes are still oddly not compatible with DXC. Even with -HV 2021:

error: an attribute list cannot appear here

But it's reasonable as long as AZSLc2 swallows those attributes.

Desirable diet features

seenat

refer to https://github.com/o3de/o3de-azslc/wiki/Features#seenats This is a necessity for mathematically infallible symbol rename and migration. (the migration from SRG scopes to global, and some typealias/structs from function scopes to outter scope which was a bonus of azsl)

Maintaining this is the most costly because of its dependency to reliable lookup. Lookup depends on semantic contexts and requires understanding of scopes, type deduction, inheritance, function overloads, and overrides. Introduction of templates is hindered by the weight of updating all these mechanisms.

impacts:

srg-constants references mutations

in the original azsl source:

// ...
ShaderResourceGroup S : slot1
{
    float3 sunDir;

    float3 Get() { return sunDir; }
};

float4 psmain() : SV_Target0
{
    return float4(S::sunDir, 1);
}

accesses to sunDir get mutated to their actual materialization in a generated constant buffer, as such:

struct S_SRGConstantsStruct
{
    float3 S_sunDir;
};

ConstantBuffer<::S_SRGConstantsStruct> S_SRGConstantBuffer : register(b0, space0);

float3 S_Get()
{
    return ::S_SRGConstantBuffer.S_sunDir ;
} 

float4 psmain() :SV_Target0
{
    return float4 ( ::S_SRGConstantBuffer.S_sunDir , 1 ) ;
}

Finding the points of mutation requires the seenat system.

One way to do away with that problem is to adopt the option strategy which is to declare a static variable that is fetch from a function call. Refer to wiki features paragraph for illustration.

rootconstant mutations

azsl example source:

rootconstant bool fog;

static const float3 fogClr = float3(0.5, 0.5, 0.5);
float4 psmain(float3 clr : COLOR0, float d : DEPTH) : SV_Target0
{
    return float4(fog ? lerp(clr, fogClr, pow(1.8, d)) : clr, 1);
}

results in mutated references to:

bool GetShaderRootConst_Root_Constants_fog();

static const bool _g_Root_Constants_fog = GetShaderRootConst_Root_Constants_fog();

static const float3 fogClr = float3 ( 0.5 , 0.5 , 0.5 ) ;
float4 psmain( float3 clr :COLOR0,  float d :DEPTH) :SV_Target0
{
    return float4 ( _g_Root_Constants_fog ? lerp ( clr , fogClr , pow ( 1.8 , d ) ) : clr , 1 ) ;
} 

struct Root_Constants
{
    bool fog;
};
ConstantBuffer<::Root_Constants> rootconstantsCB : register(b0, space0);
bool GetShaderRootConst_Root_Constants_fog()
{
    return ::rootconstantsCB.fog;
}

Suggestion for solution: the variable is mutated at definition site to a self-initializing static. Therefore we could imagine that the name needn't change. That behavior seems like a conservative choice but it doesn't seem necessary, the original symbol name could probably be preserved. That would be consistent with the behavior for options. And lift us from the need of iterating the references.

--no-ms, --strip-unused-srg, --cb-body, --bindingdep

Mentioned in later paragraphs.

Packing

Feature to reflect "constantByteOffset" "constantByteSize" "typeDimensions" (document). It would be desirable to delete all the alignment computation code that serves as a support to RHI for buffer-as-byte understanding of where (at what offset) variables actually sit in the CB. This code is heavy and costs many tests, has cost long investigations of reverse engineering DXC, and multiplied by the combination of options for vulkan/dx/glsl rules.

Dependencies: The pack computer relies on the type system. Because it accesses type class (user defined or fundamental, matrix or vector) and typeinfo (array or matrix dimensions), and sizeof. The type system can't work without symbol lookup system because they can be combined (UDT members, inheritance...), and typedefed.

Alternative: rely on a sort of reflective DXC API?

Constant folding

It would save us a small amount of code, but always a nice to add to the diet. Unfortunately at this point, it's important for array dimension reflection of SRG resources. Also for [[pad_to(N)]] feature, or option range, the thread count reflector (for metal), static sampler reflection.

Difficult features

--no-ms

Any route will at least diminish the robustness of the current approach: no more possible to check that X in X.Load() is a Texture2DMS type-referring symbol since it relies on Lookup facility, and typeof facility.

Yellow route: Totally unsupported. We won't have enough grammar power to work on AST level anymore.

Alternatives:

--cb-body

This behavior switch on the CLI activates a mode where generation of constant buffer takes a different form, as such: input:

ShaderResourceGroup SRG : slot1
{
    float4 color;
    struct CB { bool fog; };
    ConstantBuffer<CB> uniforms;
};

float4 MainPS(float2 uv : TEXCOORD0) : SV_Target0
{
    return SRG::color + SRG::uniforms.fog;
}

output:

struct SRG_CB { bool fog; };

ConstantBuffer SRG_CBContainer : register(b0)
{
    CBVArrayView uniforms;
    float4 SRG_color;
};
static const RegularBuffer<::SRG_CB> SRG_uniforms = RegularBuffer<::SRG_CB> (uniforms);

float4 MainPS( float2 uv :TEXCOORD0) :SV_Target0
{
    return ::SRG_color + ::SRG_uniforms[0] . fog ;
}

We note that again, in MainPS the references to external resources are mutated to a different symbol. SRG::color becomes ::SRG_Color and more complicated, access to uniforms has be enriched with a subscript access.

Solution: It seems like we could once again go with the option strategy, declare a static accessor variable with an initializer calling a generated getter function fetching the member in the generated ConstantBuffer corresponding member. As a matter of fact, since these srg-constant variables becomes immediately visible to the outter scope, they may even be declared as is, with the same original name. It will cause declaration vs access order problems though since their declaration site will be migrated all together into one location. We already have this problem for srg-constants. The static variable with initializer seem like a simple enough counter strategy for that.

The [0] subscript can maybe be solved in the exact same way, let the references refer to a generated static variable, and initialize it with a fetcher function, the body of that fetcher function will possess the [0] subscript. This way we free the reference sites accross the program from need of awareness of this specificity.

--bindingdep

This system reflects the "participantsConstants" (in JSON) by "dependentFunctions" (entry points). (documentation in features page). It relies on the reference tracker to iterate on appearances through the program of external resources. Same core system than --strip-unsued-srgs as described in this picture. The code for the feature can be found here.

We can either forgo of that facility, if we re-evaluate its necessity in sensitive platforms like vulkan or metal. Or we will need to find an alternative, using DXC internal API, maybe by analyzing remaining resources post optimization. Though I seem to recall optimization was not a factor, even on the contrary we needed to know the variables that should be there by contract, irrespective of potential dead-variable optimizations. Maybe leveraging clang (or the clang in DXC) if push comes to shove.

--strip-unused-srgs

Reference doc: https://github.com/o3de/o3de-azslc/wiki/Features#strip-unused-srgs This relies on the homonym visitor system to visit the seenats of each resource inside an SRG. It's the same problem as --bindingdep since it relies on the same system.

Any route: will rid us of ability to iterate over seenats since the point of the evolution RFC, is to remove the complexity involved in the seenat system.

Alternatives: Drop that feature? It seems undesirable in raytracing contexts. Maybe we can hack an artificial resource-toucher to force DXC to not optimize out resources (or propose a flag)? Or if we do the clang explorer for --bindingdep it will be factorized for that feature too.

santorac commented 1 year ago

I think this is a step in the right direction, to remove the need for complex automatic name mangling, at the cost of some need for developer awareness. I do have some questions though...

First, I don't quite understand what you are saying about collision avoidance. Under the current automatic system, yes there is some need for collision avoidance, because the user might have their own variable already named SRG_m_uniforms. But if we remove the automatic mangling and we are expecting the developer to be aware of the implicit names, then I think it would okay to also expect the user to avoid having some other symbol with the same name. If DXC compilation fails, the developer should just go rename their variable.

Second, even though we do expect shader authors to be aware of what's going on with SRG field names, I am a little concerned about discoverability for less familiar users who are just reading shader code. If they find the symbol SRG_m_uniforms and do a find-in-files for that name, they won't be able to discover the declaration site. I can think of a couple ways to mitigate this, I'll put my suggestions separately in comments below.

santorac commented 1 year ago

Suggestion 1: By convention, the author will manually define accessor functions for all SRGs.

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
};

ShaderResourceGroup SRG : slot1
{
    struct CB
    {
        float4 color;
    };

    ConstantBuffer<CB> m_uniforms;
};

float SRG_GetColor()
{
    // Note, automatic code-gen will transform SRG's m_uniforms to this
    return SRG_m_uniforms.color;
}

The the rest of the code can just use the accessor function, so if someone reading the code searches for "SRG_GetColor" they will arrive at the SRG definition.

#include "inputs.hlsl"

float4 MainPS( float2 uv :TEXCOORD0) :SV_Target0
{
   return SRG_GetColor();
}
santorac commented 1 year ago

Suggestion 2: Don't do any automatic renaming. The developer will write standard resource declarations, without register spaces, and all that AZSLc will do is figure out the registers and alignments. There can be sentinels in comments for marking the start and end of the SRG.

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
};

//{{ShaderResourcesGroup : slot1 }}
struct SRG_CB
{
    float4 color;
};
ConstantBuffer<SRG_CB> SRG_uniforms;
//{{End ShaderResourcesGroup}}
santorac commented 1 year ago

Suggestion 3: Similar to suggestion 2, but we use a namespace to wrap the SRG and just use an attribute or comment sentinel to mark it as an SRG

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
};

[ShaderResourcesGroup(slot1)]
namespace MaterialSrg
{
    struct CB
    {
        float4 color;
    };
    ConstantBuffer<CB> uniforms;
}
siliconvoodoo commented 1 year ago

if we remove the automatic mangling and we are expecting the developer to be aware of the implicit names, [..] it's ok to also expect the user to deal with collisions

Agreed. Also I like suggestion 2, I added it to the RFC. And will continue to refine it later.

siliconvoodoo commented 1 year ago

Edit history:

siliconvoodoo commented 1 year ago

Report on how it's going.

The prototype (https://github.com/SiliconStudio/o3de-azslc-evo) is following the red development curve of the RFC. The horizontal part (additive coding) being done: the support of namespaces in the front end. Yesterday I managed to get the system to support reopening of namespaces (akin to partial SRG). And the emission to respect the original pattern (multiple namespaces). I still have an assert when using nested namesapces. It takes effort to preserve the original program structure in the current emission system. Because it is based on looping over the "ordered symbol apparition list" and recreate the code. (Let's call that recreation: "reformatting AST expansion").

On a dev curve like the yellow curve we would have a minimalist emission system based on token reproduction, and possibly alteration and insertion (to add implicit struct, and option getters). But the emission would be more of a copy (like we do now for function bodies), than a reformatting AST expansion. Anyway that route would cost a lot more effort, to reproduce all the logic about the --cb-body option, the srgConstant struct construction, pound-line directives, the extended type operators.

Also any route of this RFC aims at doing without the ref tracker, but the option getters have a system reliant on the reference tracker that is causing option references to mutate to a call to the getter function. That will have to be probably un-abstracted, if we lose the ref tracker (seenat) requiring the programmer to use the getters right from the original source. Srg constants also undergo the same sort of reference mutation (to mention the implicit generated struct and access its member) Same for cb-body option since it mutates references that don't use [0] subscript to get the subscript. (and quite cleverly since it has a recursive detector to find its presence. look for FindRuleInAstThatIs0AwayWrtPointerDistance in the source). cb-body IIRC is meant for support of provo so it is also a question if that platform can be dropped maybe the whole support in azslc can to.

Example:

ShaderResourceGroupSemantic slot1
{
    FrequencyId = 1;
    ShaderVariantFallback = 128;
};

option bool reflections = false;

[[SRG("slot1")]]
namespace MaterialSRG
{
    struct CB
    {
        float3 sceneBounds;
    };

    ConstantBuffer<CB> uniforms;

    float4 fogClr;
    float4 skyClr;
    float3 sunDir;

    Texture2DMS<float4, 8> fresnel;

    enum Composite { Spec, Diff };

    Composite Get() { return Spec; }
    CB Get(float m) { return uniforms; }

    // crash for now:    
    // namespace SubNS
    // {
    //     float4 GetIBL(float3 dir){ return saturate(dot(dir,sunDir)) * skyClr; }
    // }
};

namespace MaterialSRG   // reopening an srg-namespace should cause a natural "partial" behavior
{
    TextureCube IBL;
    sampler s;
}

float4 PSMain2(float2 uv : TEXCOORD0) : SV_Target0
{
    return float4(MaterialSRG::Get(0).sceneBounds, 1) * MaterialSRG::IBL.Sample(MaterialSRG::s, float3(uv,0));
}

results in:

bool GetShaderVariantKey_reflections();

#if defined(reflections_OPTION_DEF)
    static const bool reflections = reflections_OPTION_DEF ;
#else
    static const bool reflections = GetShaderVariantKey_reflections();
#endif

namespace MaterialSRG
{
    struct CB
    {
        float3 sceneBounds;
    };

    enum Composite
    {
        Spec,
        Diff,
    };

    ::MaterialSRG::Composite Get()
    {
        return Spec ;
    }

    ::MaterialSRG::CB Get( float m)
    {
        return uniforms ;
    }

}
namespace MaterialSRG
{
    Texture2DMS<float4, 8> fresnel : register(t0, space0);

    TextureCube IBL : register(t1, space0);

    SamplerState s : register(s0, space0);

    struct _srgInfoSymbol__srgConstantsStruct
    {
    };

    ConstantBuffer<::MaterialSRG::_srgInfoSymbol__srgConstantsStruct> _srgInfoSymbol__srgConstantBuffer : register(b0, space0);

    ConstantBuffer <::MaterialSRG::CB> uniforms : register(b1, space0);
}

float4 PSMain2( float2 uv : TEXCOORD0) : SV_Target0
{
    return float4 ( MaterialSRG :: Get ( 0 ) . sceneBounds, 1 ) * MaterialSRG :: IBL . Sample ( MaterialSRG :: s, float3 ( uv, 0 ) ) ;
}

bool GetShaderVariantKey_reflections()
{
    uint shaderKey = (::MaterialSRG::m_SHADER_VARIANT_KEY_NAME_[0].x >> 0) & 1;
    return (bool) shaderKey;
}

I got the SRGInfo to register at the last apparition of an SRG namespace and emit at its end. This way it can depend on types defined above, like struct CB. Unfortunately it aggregates view and CB types and emit them all late. So the two Get() functions above now can't compile since they depend on symbol declared after. This is not solvable, unless we break the srginfo emission into a each-srg-member emission independently, and we run a pass of the symbol-dependency-reorder system to attempt to solve accesses to srg-constants which get migrated to a struct anyway. (that system is deactivated now as its on the diet list). A similar problem also exist in version 1 though, it's covered by the document about partial: https://www.o3de.org/docs/atom-guide/dev-guide/shaders/azsl/#partial-shaderresourcegroup-definitions

On the result example above we can also see the _srgInfoSymbol__srgConstantsStruct struct is empty, because it gets constructed during a call to SetupScopeMigrations (link to code) which is also on the diet list. I'll see if I can hack a version of that function that doesn't rely on scope migrations, but just setups custom behaviors.

siliconvoodoo commented 1 year ago

I believe that in the light of all that investigation data, the cost of making the evolution project is in the order of magnitude of running after and integrating N=2 to 3 HLSL language evolution features in azslc1.x At least it seems that making templates work in azslc1.x will be cheaper than this RFC direction. How many N are we investing for might be the question here.

That said, even in the case of a conservation of the azslc1.x lineup, I am seduced by some elements of this RFC. For example, I think we should pursue the C.Santora's syntax (concept idea 2) with the namespaces, and implement a unification of the emission method of option, rootconstant & srg-constants (generalize the option way). Advantages: 🆒 render azsl hlsl-compatible (attributes excepted) 🆒 diminish radically the dependency on flawlessness of the seenat system. (waves potential problems of the lack of 2 pass scanning as described in this documentation addition) 💯 will give opportunity to get rid of the partial declaration/usage dilemma limitation 🆓 Remove the need for symbol scope migration 🆓 Very probably frees us from the topological solver ReorderBySymbolDependency 🆒 Improve touch&feel by DXC reporting errors related to symbols that will have the same name than in the original azsl source name 🆒 namespace support in the front-end can be retroported to azslc1.x from the work in the prototype

That would bring the azslc1.x lineup that much closer to what's necessary for allowing free language evolution, in a more iterative fashion than the sudden fork necessitated by red/yellow/green paths to azslc2. Though without attaining it yet. As such, this could be a softer plan, than all-at-once, to render to goal and readvise after the first step: image Update to v2 could use the normal branch & PR process. But I suggest the jump to "evo" being too massive, the v2 be conserved as legacy, and the new evo be continued on a forked repo.

santorac commented 1 year ago

I don't fully understand what you mean about the "lineage timeline", but overall it sounds like you are continuing to move in a good direction. Looking forward to discussing more during our call next week.

Something else to remember, not sure how applicable it is: Keep thinking about whether there are additional restrictions we can define that would further simplify the implementation. For example, if having functions defined inside the SRG namespace is problematic, I think it's okay to just say we don't support that. If nested namespaces don't work right, that's not so terrible. We can publish a list of caveats and known issues, if necessary to reduce the complexity of implementation.