llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.97k stars 11.94k forks source link

Implement the `WaveActiveMax` HLSL Function #99170

Open farzonl opened 3 months ago

farzonl commented 3 months ago

DirectX

DXIL Opcode DXIL OpName Shader Model Shader Stages
119 WaveActiveOp 6.0 ('library', 'compute', 'amplification', 'mesh', 'pixel', 'vertex', 'hull', 'domain', 'geometry', 'raygeneration', 'intersection', 'anyhit', 'closesthit', 'miss', 'callable', 'node')

SPIR-V

OpGroupNonUniformFMax:

Description:

A floating point maximum group operation of all Value operands contributed by active invocations in by group.

Result Type must be a scalar or vector of floating-point type.

Execution is a Scope that identifies the group of invocations affected by this command. It must be Subgroup.

The identity I for Operation is -INF. If Operation is ClusteredReduce, ClusterSize must be present.

The type of Value must be the same as Result Type. The method used to perform the group operation on the contributed Value(s) from active invocations is implementation defined. From the set of Value(s) provided by active invocations within a subgroup, if for any two Values one of them is a NaN, the other is chosen. If all Value(s) that are used by the current invocation are NaN, then the result is an undefined value.

ClusterSize is the size of cluster to use. ClusterSize must be a scalar of integer type, whose Signedness operand is 0. ClusterSize must come from a constant instruction. Behavior is undefined unless ClusterSize is at least 1 and a power of 2. If ClusterSize is greater than the size of the group, executing this instruction results in undefined behavior.

Capability:
GroupNonUniformArithmetic, GroupNonUniformClustered, GroupNonUniformPartitionedNV

Missing before version 1.3.

Word Count Opcode Results Operands

6 + variable

358

<id>
Result Type

Result <id>

Scope <id>
Execution

Group Operation
Operation

<id>
Value

Optional
<id>
ClusterSize

Test Case(s)

Example 1

//dxc WaveActiveMax_test.hlsl -T lib_6_8 -enable-16bit-types -O0

export float4 fn(float4 p1) {
    return WaveActiveMax(p1);
}

Example 2

//dxc WaveActiveMax_1_test.hlsl -T lib_6_8 -enable-16bit-types -O0

export uint4 fn(uint4 p1) {
    return WaveActiveMax(p1);
}

Example 3

//dxc WaveActiveMax_2_test.hlsl -T lib_6_8 -enable-16bit-types -O0

export int4 fn(int4 p1) {
    return WaveActiveMax(p1);
}

HLSL:

Returns the maximum value of the expression across all active lanes in the current wave and replicates it back to all active lanes.

Syntax

<type> WaveActiveMax(
   <type> expr
);

Parameters

*expr*
The expression to evaluate.

Return value

The maximum value.

Remarks

The order of operations is undefined.

This function is supported from shader model 6.0 in all shader stages.

 

Examples

 float3 maxPos = WaveActiveMax( myPoint.position );
    BoundingBox.max = max( maxPos, BoundingBox.max );

See also

[Overview of Shader Model 6](https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src//direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12.md)
[Shader Model 6](https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src//direct3dhlsl/shader-model-6-0.md)
davidcook-msft commented 2 months ago

attribute ((convergent))

Refer to #103299