llvm / wg-hlsl

HLSL Working Group documentation and task tracking.
Other
11 stars 13 forks source link

[SPIR-V] Add proposal for I/O builtin in Clang #97

Open Keenuts opened 3 weeks ago

Keenuts commented 3 weeks ago

Initial proposal to implement Input/Output built ins with both semantic & inline SPIR-V

Keenuts commented 2 weeks ago

Thanks, I have published another draft PR and updated this PR to avoid changing the spec. This should be closer to what DXIL is doing, but will change a bit how the few semantics we already have are implemented (since we now have a generic intrinsic).

New draft PR is https://github.com/llvm/llvm-project/pull/116393

Keenuts commented 1 week ago

@tex3d what were your concerns about this design?

s-perron commented 1 week ago

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?
Keenuts commented 1 week ago

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?

For 1 and 2, I'd assume if we optimize everything else, and just leave a load/store, driver should be able to optimize those away.

However, form the Vulkan spec, an Output BuiltIn starts with an undefined value. So in this design, if no value is written by the user, a default value will be written to the output in the dtor. I'd say that's OK, since we replace an undefined behavior with another behavior.

tex3d commented 4 days ago

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?

For 1 and 2, I'd assume if we optimize everything else, and just leave a load/store, driver should be able to optimize those away.

However, form the Vulkan spec, an Output BuiltIn starts with an undefined value. So in this design, if no value is written by the user, a default value will be written to the output in the dtor. I'd say that's OK, since we replace an undefined behavior with another behavior.

The never-use case is the simplest to resolve with special handling, but I think these should be guaranteed to be eliminated before SPIR-V, otherwise they could be illegal, since you're adding everything declared globally to any entry compiled from the file (you could have a built-in that isn't valid in another entry point type).

I'm still not sure how you reliably move accesses to the correct control-flow locations without a special pass or something. If you are going to use a custom pass for these anyway, why bother with the constructor/destructor mechanism in the first place, since that will just obfuscate the real access locations which would be much easier and more reliable to translate directly from loads/stores of the original global variable?

Adding a special constructor/destructor to initialize/store a global variable when an attribute is present feels like a weird hack from the language semantics perspective already.

IMHO, this attribute should be deprecated in 202x and replaced with something cleaner for 202y.

Keenuts commented 4 days ago

I think input and output are different enough to warrant separate examples and descriptions of behavior. I'm not sure why we would be storing to an input in a destructor or loading from an output in a constructor.

Agree, I'll modify the proposal to remove dtor for input as those are not useful. For ctor for output, see part below about storing undef.

I am a bit concerned how we specify that variables only read from or written to in control flow preserve these intended load/store locations.

Ok I think I got it. The concern is: what if a lane conditionally stores to a built-in. Since store to the builtin is done at the end, you worry the undefined value in the output will be changed in all paths, since the built-in store is now unconditional.

This specific issue can be solved by having a ctor on output builtins.

Keenuts commented 4 days ago

@tex3d answered to your concerns, and changed the spec to share more details around those specifics handling of inputs/outputs. Let me know if I missed something!

Keenuts commented 3 days ago

@tex3d so Steven found a case which hinted as something incomplete: the PointSize BuiltIn. Unlike this others, this one is assumed to have the value 1.0 if not stored to. But question was: would a load/store pair be OK.

The answer is actually not that clear. In SPIR-V, those yield the same result:

But what's interesting is:

Additional weird detail:

printf("%f, %x", point_size, asuint(point_size)); // Prints "0, 0".
point_size = point_size + 10.f; // Yields point_size = NaN.

So this hints in the direction that the actual store could be assumed to have a hidden side-effect, and thus I shall not emit a load/store pair if the user hasn't written to the builtin. So this means if a branch stores a value, and another doesn't, I cannot simple store in both.

Need to think how to solve this.

Keenuts commented 2 days ago

Hello all! So for the reasons outlined above (builtin store has an hidden side-effect), I changed the whole design to implement this as regular global variables, and remove the ctor/dtor/load/store builtins.