Provide or Endorse a CPU C++ compatibility header library

microsoft / hlsl-specs

HLSL Specifications

MIT License

119 stars 33 forks source link

Provide or Endorse a CPU C++ compatibility header library #36

Closed devshgraphicsprogramming closed 1 year ago

devshgraphicsprogramming commented 1 year ago

Is your feature request related to a problem? Please describe.

I like to share my code between the CPU and the GPU, this means that I need C++ implementations of floatN and matrixNxM to compile even the most simplest "utility header HLSL" code as C++ for use by the CPU.

Describe the solution you'd like

Instead of eye-balling what I should implement, or hunting for a ready made and compatible solution (as I am doing now), DXC or HLSL itself should either maintain or endorse a C++ header-only library which provides implementations of basic HLSL types (scalars, vectors, matrices) and non-stage and non-resource specific builtin functions (fma, pack/unpack, float conversions, transcendentals and other math functions, limited wave intrinsics).

Describe alternatives you've considered

I'm doing my own research and trying to find something that works and can be extended, but it would be nice if we settled on a consesus or saved others the time. So far I've found:

https://github.com/microsoft/DirectXMath

~~Pros: By microsoft~~ Cons: This is some old DirectX SDK C-API header library, nothing to do with HLSL-like interface

https://github.com/redorav/hlslpp

Pros: has AVX and FMA, and float8 Cons: not feature complete (e.g. doubles), unsure how well it works for HLSL2021 https://github.com/redorav/hlslpp/issues/66

GLM

Pros: Most widely known and well used Cons: floats and matrices not defined the same way as HLSL2021, (also only uses SSE2 or AVX and only for float4 and matrix4)[https://glm.g-truc.net/0.9.1/api/a00248.html#_details]

Additional context Ideally the library should implement float4 and others with SSE4.2 and NEON as much as possible, and maybe even provide a float8 with AVX and FMA optimizations.

Also a benchmark suite representative of typical workloads so we could focus effort on optimizing (providing raw intrinsic implementations) where it matters most.

devshgraphicsprogramming commented 1 year ago

after a ton of googling it seems that its only a GLM vs hlslpp choice.

devshgraphicsprogramming commented 1 year ago

after a ton of googling it seems that its only a GLM vs hlslpp choice.

found this too by accident: https://github.com/dangmoody/HLML

llvm-beanz commented 1 year ago

A C++ compatibility header isn't really possible for HLSL. The HLSL language has constructs that can't be represented in plain C++. Some of them (like the HLSL vector types), can almost be represented with Clang extensions, but that isn't standard portable C++.

HLSL deviates from C++ in some pretty fundamental ways that a header can't capture. For example HLSL's converting lvalue-casts, and parameter passing semantics, are very different from anything C++ provides. At best, you could produce something that kinda looks like C++, and might behave similarly but also might have hidden and difficult to diagnose differences.

The meta problem you describe of wanting to share code between the CPU and GPU is very much central to the current evolution of HLSL. One of the biggest motivators for HLSL 2021 and the C++ features we're planning to bring in for future versions is to empower sharing code between C++ CPU code and HLSL GPU code. To get there we need to introduce a lot of fundamental shifts in HLSL so that we reduce or eliminate the places where HLSL and C++ rules interpret the same code radically differently.

One other possible solution to your problem is something I've been thinking about for our work with Clang. I would like to build a portable CPU HLSL runtime to use for testing and development.

DXC isn't capable of generating CPU code. DXIL can be translated to run on a CPU (Warp does this), but it isn't really a portable runtime.

As we implement HLSL support in Clang this likely becomes an easier problem since Clang is inherently capable of targeting CPUs. We don't have any immediate plans to build CPU-targeting support for HLSL, so that's mostly just a happy dream in my mind.

While I understand the value of this request. I don't think it is a language feature for HLSL that we can address through this specs process, so I'm closing this issue.

devshgraphicsprogramming commented 1 year ago

One of the biggest motivators for HLSL 2021 and the C++ features we're planning to bring in for future versions is to empower sharing code between C++ CPU code and HLSL GPU code.

But you won't be able to hammer out all the differences, if you don't even provide C++ "data compatible" types.

DXC isn't capable of generating CPU code. DXIL can be translated to run on a CPU (Warp does this), but it isn't really a portable runtime.

As we implement HLSL support in Clang this likely becomes an easier problem since Clang is inherently capable of targeting CPUs. We don't have any immediate plans to build CPU-targeting support for HLSL, so that's mostly just a happy dream in my mind.

DXIL emulation, cross compilation or compiling actual HLSL to x86 or ARM is explicitly not what I'm asking for.

Consider the followin use case, I have some structs with factory static methods. Lets say lights, frustums etc.

Why would I write them out twice or use some macros/aliases to get access to HLSL basic types and mathematical functions if I want to "construct" these both in lets say a compute shader and in C++ to feed as structured buffers?

So far the only "correct" library seems to be GLM, as HLSL++ messes up the type sizes badly (they're neither structured buffer layout or constant buffer layout) and HLML is just GLM but rewritten because of a personal dislike for templates.

llvm-beanz commented 1 year ago

But you won't be able to hammer out all the differences, if you don't even provide C++ "data compatible" types.

Sure, but in many ways that's a language change for C++, not HLSL. Coming back to HLSL vector types, HLSL vector swizzles aren't valid in C++, and they won't be unless C++ adopts new syntax.

DXIL emulation, cross compilation or compiling actual HLSL to x86 or ARM is explicitly not what I'm asking for.

Apologies I misunderstood. IIUC, you're asking to be able to define data structures in a header (without a lot of macro nastiness), and compile that header for C++ and HLSL, so that you can use the same data structure in your C++ code when you package up and send data to your HLSL GPU code. Is that a correct statement of the request?

Consider the followin use case, I have some structs with factory static methods. Lets say lights, frustums etc.

Why would I write them out twice or use some macros/aliases to get access to HLSL basic types and mathematical functions if I want to "construct" these both in lets say a compute shader and in C++ to feed as structured buffers?

So far the only "correct" library seems to be GLM, as HLSL++ messes up the type sizes badly (they're neither structured buffer layout or constant buffer layout) and HLML is just GLM but rewritten because of a personal dislike for templates.

I think there are two problems here:

(1) Not to sound bureaucratic, but the reason I closed this issue is that this isn't the place to design and build a C++ header/library. This space is for designing the HLSL language, not a C++ header/library.

(2) HLSL is not C++. It cannot be defined as a superset or a subset of C++. It is subtly different in ways that make it extremely difficult to share even trivial bits of code and have it behave the same way in both languages. As a good example see this code that shows how HLSL parameter passing doesn't do what you'd expect.

Extremely carefully written HLSL can be compiled as C++ and vice versa, but it is very difficult to vend a header/library that provides appropriate insulation and guard rails to allow users to write code that works the same in HLSL and C++.

Our goal is to change HLSL to make this possible, and that is stated in our design statement for HLSL 202x. Features like the adding references and constructors, fixing const-qualification and overload resolution, removing HLSL's odd initializer lists, all take incremental steps toward getting to making it possible to safely and confidently share data structures between C++ and HLSL.

devshgraphicsprogramming commented 1 year ago

Apologies I misunderstood. IIUC, you're asking to be able to define data structures in a header (without a lot of macro nastiness), and compile that header for C++ and HLSL, so that you can use the same data structure in your C++ code when you package up and send data to your HLSL GPU code. Is that a correct statement of the request?

Yes but also the HLSL "non stage specific builtin functions" like mul(), sin, cos, etc. for the types

Its actually not as trivial as just defininig your own struct with appropriate anonymous unions.

(1) Not to sound bureaucratic, but the reason I closed this issue is that this isn't the place to design and build a C++ header/library. This space is for designing the HLSL language, not a C++ header/library.

It should be bundled with DXC or whatever the future compiler will be, in my humble opinion.

(2) HLSL is not C++. It cannot be defined as a superset or a subset of C++. It is subtly different in ways that make it extremely difficult to share even trivial bits of code and have it behave the same way in both languages. As a good example see this code that shows how HLSL parameter passing doesn't do what you'd expect.

Variable shadowing is never a good idea ;)

Can we focus on cases that would compile with -Werror and don't use inout (or would substitute any out with a non-const reference)?

llvm-beanz commented 1 year ago

It should be bundled with DXC or whatever the future compiler will be, in my humble opinion.

Okay, but that's a toolchain feature, not a language feature.

Variable shadowing is never a good idea ;)

This isn't shadowing, it is aliasing, and it gets much more complicated when you have complex data structures comprised of other data structures. As soon as you have by-address behaviors (which HLSL does not), the ability for memory to overlap becomes possible. Aliasing happens a lot, and often intentionally.

Another example where things are very different in HLSL is that HLSL arrays are passed by value, not pointer. So if you pass an array into a function in HLSL, it isn't strictly array->pointer decay...

Can we focus on cases that would compile with -Werror and don't use inout (or would substitute any out with a non-const reference)?

inout and out are not representable as C++ references. You get a close approximation in some cases, but it can be subtly different, so they have to go off the table. Many of the differences between HLSL and C++ are implicit behaviors that do not trigger warnings, so -Werror doesn't really factor in here.

Stepping back. Your request here is a toolchain request not a language request, so it isn't going to be addressed here.

You could file this as a toolchain request against DXC, but I strongly suspect we might accept a PR but we won't prioritize the work over the other work we have going on right now. If the work isn't going to bubble up on our priority list, we would close the bug rather than letting it languish.

My general feeling is that until HLSL becomes more compatible with C++ as a language, the differences between C++ and HLSL would lead to more subtle bugs and misuses causing a significant support burden. For that reason I think this feature can't be considered until HLSL has evolved a bit more.

devshgraphicsprogramming commented 1 year ago

This isn't shadowing, it is aliasing, and it gets much more complicated when you have complex data structures comprised of other data structures. As soon as you have by-address behaviors (which HLSL does not), the ability for memory to overlap becomes possible. Aliasing happens a lot, and often intentionally.

I'm pretty sure that I'd get a warning about X being shadowed between a static and a local variable identifier.

But I yeah I get the point, by-address brings about the whole restrict and -fno-strict-aliasing&reinterpret_cast conundrum that you guys will need to address in HLSL.

(please for the love of God don't follow C++ here, std::bitcast running a constexpr memcpy is an abomination)

llvm-beanz commented 1 year ago

The issue isn't the shadowing, that was just an artifact of me writing silly code quickly. Renaming the shadowed declaration can illustrate (see: https://godbolt.org/z/MKqdh463j). When you pass a global variable to an inout parameter, the address of the global isn't passed into the function. The global's value is copied into the function, and updated after the function returns. This causes some potentially unexpected results (note: both stores store the value 2).

Effectively all parameters in HLSL are passed as restrict & to temporary values that are propagated back to their underlying lvalues after the function is executed. This gets even more crazy when you look at how casting works in HLSL, because you can implicitly cast an lvalue to a different type when passing into a function and have the reverse cast be performed on function exit (see: https://godbolt.org/z/bx9G6oxo7). This code obviously wouldn't be valid in C++.

Generally speaking I don't think the goal is to bring all of C++ into HLSL, but eliminating the ways in which HLSL is different from C++ where there isn't a good reason to be different is a good start. I also think there are a lot of hidden pitfalls in HLSL due to complicated implicit behaviors which we should work to make explicit.

devshgraphicsprogramming commented 11 months ago

We kinda bit the bullet and found out that after requiring Vulkan 1.3 and looking at all the desktop GPUs that still get driver updates, we can rely on the following always being present....

Scalar Layout

SSBO and UBO Storage:

64bit
32bit
16bit
8bit

Actual arithmetic:

8 bit integer (if HLSL only actually supported it, or had references so we could make our own uint8_t class) -16bit integer
32bit integer
64bit integer

What this means

If you enable/force all of the above in Vulkan and mandate matrices always have row-major layout, then one can effectively use GLM as the "HLSL C++ library", although we've had to do a bit of polyfilling to get the matrix to behave nicely.

https://github.com/Devsh-Graphics-Programming/Nabla/blob/41be49ce6c9d4bbab52abfab488e1027e71c4e2e/include/nbl/builtin/hlsl/cpp_compat/matrix.hlsl