Open jeremyong-az opened 3 years ago
(discussed offline that this is likely better implemented as a user-space feature, and an open question is whether this fits in the DirectXShaderCompiler
project or somewhere else)
Given that there isn't a DXIL manipulation library available out there with a permissive license, I can't see a separate user-space implementation of the feature to become available any time soon. The closest library we have today is https://github.com/HansKristian-Work/dxil-spirv, which is LGPL and therefore unlikely to be approved for use in something like Unreal Engine.
Edit: Actually I take it back, @baldurk saves the day, as always: https://github.com/baldurk/renderdoc/tree/dc6bc12da77a799d49653e5ec87823c4f599916a/renderdoc/driver/shaders/dxil
Edit2: I take back the takeback :) Apparently, Baldur does not recommend using that framework.
FYI I don't really recommend that anyone else use my code for modifying DXIL. I wrote it because I was forced to with no other realistic choice for some debugging workflows, but I don't really trust it to be rock solid and I couldn't in good conscience suggest that anyone else build anything on it without severe "here be dragons" warnings. There's too high a risk of encountering some construct or pattern that isn't handled, or generating one that is invalid, without a reliable way to predict, test, or check for it. DXIL is such a hopeless and unusable format for shader interchange that I wouldn't really trust any external code/library to be robust and reliable except for this specific fork of LLVM.
To clarify my comment, I think even if it is a "user-space feature", that user-space facility would need to be provided as part of the dxc interface since the implementation is tightly coupled to the IR
We have a benchmark in our benchmark suite that provide a small insight into this when compiling a very trivial closest hit shader.
In SPIR-V we can patch the SPIR-V directly, compiling 4096 different rchit shaders takes around ~7ms
Note: this code-path uses our own patcher rather then specialization constants in the driver due to the underlying thing we're trying to measure. We're patching to actively avoid spec constants (in other scenarios we would absolutely use them).
In DirectX we need to re-invoke the compiler with a new define every time, for those 4096 rchit shaders which takes around ~13 seconds.
The shader itself is extremely simple (the benchmark doesn't focus on compilation speed, rather it tries to test coherency gathering).
#ifndef SHADER_VARIANT
[[vk::constant_id(0)]] const uint shader_variant = -1;
#define SHADER_VARIANT shader_variant
#endif
struct Payload {
uint color; // single DWORD
};
struct Attribute {
uint unused;
};
[shader("closesthit")] void main(inout Payload payload, in Attribute attribs) {
payload.color = SHADER_VARIANT;
}
For a simple shader like this - a small 1800x speedup is nothing to sneeze at.
We have another test that does 16384 variants - there Dx12 takes ~50 second to compile permutations, while our SPIR-V patching code takes around 26ms.
This would be extremely useful for http://github.com/godotengine/godot. We thoroughly rely on a ubershader + specialization constants model to ensure small shader caching and on-the-fly shader variant permutation. It works fantastic in Vulkan, but porting Godot to Direct3D without something like this is quite difficult.
This is important because, unlike AAA titles, Godot aims to be an easy to use and inclusive game engine. The idea is that everything works out of the box as well as possible. Specialization constants in Vulkan avoid having large shader compilation times for most shader permutations, which also ensures that users with very poor hardware can make games (without having to wait for a long time for the permutations to compile).
I've developed a method that allows to have something not super far from native SCs, but less convenient than they would be: https://twitter.com/RandomPedroJ/status/1532725156623286272
This would be extremely useful for us at Firaxis. Creating shader variants via pound defines is growing to be painful for both compile time and shader wrangling.
I think the ideal form this would take would be a last minute userspace DXIL patcher rather than during pipeline creation like it is in Vulkan, as pipelines seem to have had evolutionary pressure to make their construction faster & simpler. Furthermore as the current industry is using pound defines, patching the DXIL seems like a closer stepping stone. Though I would not mind the implementation matching Vulkan's if there were also trustworthy flags to influence PSO construction time optimization.
But I see the route to a userspace DXIL patcher being faster than a field added to PSO construction. Relying on 3rd party patchers seems dangerous as @baldurk outlined and assistance from DXC to create reliable DXIL segments to ease patching in spec constants would be higher quality.
Now the announcement that D3D will move to SPIR-V in SM 7.0 has been made, I was wondering if it's now more realistic to expect support for SCs.
We had an opaquely named label "Theme:Next Major" that I've just renamed to "Theme:SM 7" which we're using to track candidate features that will be easier to enable in SM 7 when SPIR-V is our interchange format.
Currently, DXIL bytecode is emitted by the compiler for every shader permutation in a process typically managed by the user. These permutations are generally controlled via pound-defs specified as arguments to the shader compiler itself.
A preferable model however, would be to leverage a similar model to Vulkan's "specialization constants" which would allow us to generate the bytecode once, and configure the last-mile compilation on the driver with additional PSO state. This would dramatically reduce storage costs for bytecode/pdbs, as well as compilation time.
I'm sure there are many complexities in getting a feature like this over the finish line, but just wanted to mention this in case it could get some traction or understand the roadblocks if its untenable (or if I'm misunderstanding the advantages the approach could confer).