Closed jkwak-work closed 1 month ago
This is a subtask from #4582.
These are the results for SPIRV compilation (averaged over 100 compilations per step):
slangc: ..\..\build\Release\bin\slangc.exe
target: spirv
samples: 100
¡ compiled buffers.slang.
¡ compiled common.slang.
¡ compiled enumerations.slang.
¡ compiled environment.slang.
¡ compiled hit.slang.
¡ compiled material.slang.
¡ compiled mdl.slang.
¡ compiled runtime.slang.
¡ compiled shading.slang.
¡ compiled types.slang.
¡ compiled closeshit (module).
¡ compiled anyhit (module).
¡ compiled shadow (module).
¡ compiled closesthit (mono).
¡ compiled anyhit (mono).
¡ compiled shadow (mono).
results:
module precompilation 1.553s
module whole compilation 0.220s
closeshit 0.122s
anyhit 0.049s
shadow 0.049s
monolithic compilation 0.608s
closeshit 0.250s
anyhit 0.179s
shadow 0.179s
speed up factor 2.77x
closeshit 2.06x
anyhit 3.67x
shadow 3.62x
Also please see this script to see how this benchmark is done.
Judging by the script, the timing number presented is based on spCompile
.
But what does it include? And what doesn't it include?
This seems to be the source of the confusion.
We also want to know how much DXC takes to do the same as the monolithic case.
But what does it include? And what doesn't it include?
I will check this out in more detail shortly.
We also want to know how much DXC takes to do the same as the monolithic case.
Here are the benchmarks which include DXC compilation to SPIRV.
slangc: ..\..\build\Release\bin\slangc.exe
target: spirv
samples: 16
[...]
results:
module precompilation 1.556s
module whole compilation 0.219s
closeshit 0.119s
anyhit 0.050s
shadow 0.050s
monolithic compilation 0.611s
closeshit 0.252s
anyhit 0.180s
shadow 0.179s
speed up (vs mono) 2.80x
closeshit 2.12x
anyhit 3.62x
shadow 3.58x
dxc compilation of hlsl 0.385s
closeshit 0.129s
anyhit 0.128s
shadow 0.128s
speed up (vs dxc) 1.76x
closeshit 1.09x
anyhit 2.57x
shadow 2.56x
One caveat is that for DXC this is using the original MDL generated HLSL files, which contain a little more source code because of the macros included (some of which cannot be resolved, so they have been removed). I will include a benchmark using slang generated HLSL files as well.
Same benchmarks, but with the target now being DXIL:
slangc: ..\..\build\Release\bin\slangc.exe
target: dxil
samples: 64
[...]
results:
module precompilation 1.772s
module whole compilation 0.231s
closeshit 0.127s
anyhit 0.051s
shadow 0.053s
monolithic compilation 0.638s
closeshit 0.263s
anyhit 0.186s
shadow 0.189s
speed up (vs mono) 2.76x
closeshit 2.07x
anyhit 3.62x
shadow 3.56x
dxc for original hlsl 0.217s
closeshit 0.073s
anyhit 0.072s
shadow 0.072s
speed up (vs dxc) 0.94x
closeshit 0.57x
anyhit 1.41x
shadow 1.36x
The following is a more comprehensive set of results including all the targets. The largest differentiating factor between module-based and monolithic compilation seems to be the fact that module linking does not go through semantic checking,
Additional notes:
-emit-spirv-directly
option is used.-emit-spirv-via-glsl
flag is provided.-T lib_6_1 -Vd
flags.-spirv
and -fspv-target-env=vulkan1.3
flags are given.Module precompilation (1.592s)
Module whole compilation (0.232s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 0.128s | 0.069s (54.5%) | 0.000s (0.0%) | 0.049s (38.2%) | |
Any Hit | 0.052s | 0.033s (64.7%) | 0.000s (0.0%) | 0.013s (25.9%) | |
Shadow | 0.053s | 0.034s (65.0%) | 0.000s (0.0%) | 0.013s (25.5%) |
Monolithic compilation (0.639s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 0.262s | 0.071s (27.1%) | 0.126s (48.2%) | 0.048s (18.5%) | |
Any Hit | 0.189s | 0.034s (18.0%) | 0.130s (68.7%) | 0.014s (7.2%) | |
Shadow | 0.188s | 0.034s (18.1%) | 0.129s (68.5%) | 0.014s (7.3%) |
DXC for original HLSL (0.233s) | Entry | Total |
---|---|---|
Closest Hit | 0.078s | |
Any Hit | 0.077s | |
Shadow | 0.077s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 2.755x | 1.004x | |
Closest Hit | 2.051x | 0.612x | |
Any Hit | 3.660x | 1.499x | |
Shadow | 3.575x | 1.470x |
Module precompilation (1.575s)
Module whole compilation (0.223s) | Entry | Total | Link & Optimize | Semantic Checking | Emit Spirv | Spirv-Opt |
---|---|---|---|---|---|---|
Closest Hit | 0.123s | 0.087s (70.3%) | 0.000s (0.0%) | 0.009s (7.4%) | 0.023s (18.4%) | |
Any Hit | 0.050s | 0.039s (78.8%) | 0.000s (0.1%) | 0.003s (5.6%) | 0.005s (10.8%) | |
Shadow | 0.050s | 0.040s (79.4%) | 0.000s (0.0%) | 0.003s (5.5%) | 0.005s (10.5%) |
Monolithic compilation (0.617s) | Entry | Total | Link & Optimize | Semantic Checking | Emit Spirv | Spirv-Opt |
---|---|---|---|---|---|---|
Closest Hit | 0.254s | 0.083s (32.9%) | 0.130s (51.3%) | 0.008s (3.2%) | 0.022s (8.8%) | |
Any Hit | 0.184s | 0.038s (20.7%) | 0.129s (70.2%) | 0.002s (1.3%) | 0.005s (2.9%) | |
Shadow | 0.179s | 0.037s (20.8%) | 0.126s (70.3%) | 0.002s (1.2%) | 0.005s (2.9%) |
DXC for original HLSL (0.401s) | Entry | Total |
---|---|---|
Closest Hit | 0.132s | |
Any Hit | 0.135s | |
Shadow | 0.134s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 2.768x | 1.800x | |
Closest Hit | 2.061x | 1.073x | |
Any Hit | 3.688x | 2.706x | |
Shadow | 3.598x | 2.696x |
Module precompilation (1.861s)
Module whole compilation (0.364s) | Entry | Total | Link & Optimize | Semantic Checking | Glslang |
---|---|---|---|---|---|
Closest Hit | 0.176s | 0.083s (47.1%) | 0.000s (0.0%) | 0.082s (46.9%) | |
Any Hit | 0.095s | 0.038s (39.7%) | 0.000s (0.0%) | 0.053s (56.4%) | |
Shadow | 0.093s | 0.037s (39.9%) | 0.000s (0.0%) | 0.053s (56.5%) |
Monolithic compilation (0.771s) | Entry | Total | Link & Optimize | Semantic Checking | Glslang |
---|---|---|---|---|---|
Closest Hit | 0.309s | 0.082s (26.7%) | 0.128s (41.5%) | 0.083s (26.9%) | |
Any Hit | 0.228s | 0.036s (16.0%) | 0.128s (56.2%) | 0.053s (23.5%) | |
Shadow | 0.235s | 0.038s (16.1%) | 0.131s (56.0%) | 0.055s (23.6%) |
DXC for original HLSL (0.409s) | Entry | Total |
---|---|---|
Closest Hit | 0.140s | |
Any Hit | 0.134s | |
Shadow | 0.135s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 2.120x | 1.126x | |
Closest Hit | 1.753x | 0.798x | |
Any Hit | 2.409x | 1.413x | |
Shadow | 2.518x | 1.453x |
New benchmarks here, now using much larger MDL shaders (specifically OmniSurface_BrushedMetal
). Compile times cross the one second mark. At this point the Link & Optimize stage takes the majority of compile time.
Module precompilation (5.971s)
Module whole compilation (2.027s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 1.368s | 0.748s (54.7%) | 0.000s (0.0%) | 0.588s (43.0%) | |
Any Hit | 0.335s | 0.241s (71.8%) | 0.000s (0.0%) | 0.082s (24.5%) | |
Shadow | 0.324s | 0.232s (71.5%) | 0.000s (0.0%) | 0.080s (24.6%) |
Monolithic compilation (2.723s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 1.535s | 0.683s (44.5%) | 0.247s (16.1%) | 0.566s (36.9%) | |
Any Hit | 0.600s | 0.242s (40.4%) | 0.256s (42.6%) | 0.082s (13.6%) | |
Shadow | 0.588s | 0.238s (40.4%) | 0.251s (42.6%) | 0.080s (13.6%) |
DXC for original HLSL (1.898s) | Entry | Total |
---|---|---|
Closest Hit | 0.632s | |
Any Hit | 0.638s | |
Shadow | 0.627s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 1.343x | 0.936x | |
Closest Hit | 1.122x | 0.462x | |
Any Hit | 1.788x | 1.903x | |
Shadow | 1.814x | 1.934x |
Module precompilation (6.043s)
Module whole compilation (1.988s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 1.325s | 0.720s (54.3%) | 0.000s (0.0%) | 0.574s (43.3%) | |
Any Hit | 0.332s | 0.238s (71.7%) | 0.000s (0.0%) | 0.081s (24.5%) | |
Shadow | 0.331s | 0.237s (71.4%) | 0.000s (0.0%) | 0.082s (24.8%) |
Monolithic compilation (2.814s) | Entry | Total | Link & Optimize | Semantic Checking | DXC Downstream |
---|---|---|---|---|---|
Closest Hit | 1.623s | 0.745s (45.9%) | 0.255s (15.7%) | 0.584s (36.0%) | |
Any Hit | 0.595s | 0.239s (40.1%) | 0.255s (42.8%) | 0.082s (13.8%) | |
Shadow | 0.596s | 0.240s (40.3%) | 0.255s (42.8%) | 0.081s (13.7%) |
DXC for original HLSL (2.117s) | Entry | Total |
---|---|---|
Closest Hit | 0.708s | |
Any Hit | 0.707s | |
Shadow | 0.702s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 1.415x | 1.065x | |
Closest Hit | 1.225x | 0.535x | |
Any Hit | 1.791x | 2.128x | |
Shadow | 1.798x | 2.120x |
Module precompilation (6.451s)
Module whole compilation (2.208s) | Entry | Total | Link & Optimize | Semantic Checking | Emit Spirv | Spirv-Opt |
---|---|---|---|---|---|---|
Closest Hit | 1.487s | 0.863s (58.0%) | 0.000s (0.0%) | 0.192s (12.9%) | 0.424s (28.5%) | |
Any Hit | 0.360s | 0.249s (69.2%) | 0.000s (0.0%) | 0.038s (10.4%) | 0.070s (19.3%) | |
Shadow | 0.361s | 0.250s (69.4%) | 0.000s (0.0%) | 0.038s (10.4%) | 0.069s (19.1%) |
Monolithic compilation (2.998s) | Entry | Total | Link & Optimize | Semantic Checking | Emit Spirv | Spirv-Opt |
---|---|---|---|---|---|---|
Closest Hit | 1.743s | 0.863s (49.5%) | 0.247s (14.2%) | 0.194s (11.1%) | 0.424s (24.3%) | |
Any Hit | 0.626s | 0.258s (41.2%) | 0.248s (39.5%) | 0.039s (6.2%) | 0.071s (11.3%) | |
Shadow | 0.628s | 0.258s (41.0%) | 0.250s (39.8%) | 0.039s (6.2%) | 0.071s (11.3%) |
DXC for original HLSL (7.804s) | Entry | Total |
---|---|---|
Closest Hit | 2.611s | |
Any Hit | 2.610s | |
Shadow | 2.583s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 1.358x | 3.534x | |
Closest Hit | 1.172x | 1.756x | |
Any Hit | 1.738x | 7.241x | |
Shadow | 1.742x | 7.163x |
Module precompilation (6.363s)
Module whole compilation (2.205s) | Entry | Total | Link & Optimize | Semantic Checking | Glslang |
---|---|---|---|---|---|
Closest Hit | 1.438s | 0.800s (55.6%) | 0.000s (0.0%) | 0.610s (42.4%) | |
Any Hit | 0.385s | 0.254s (66.0%) | 0.000s (0.0%) | 0.121s (31.4%) | |
Shadow | 0.382s | 0.253s (66.1%) | 0.000s (0.0%) | 0.119s (31.3%) |
Monolithic compilation (2.929s) | Entry | Total | Link & Optimize | Semantic Checking | Glslang |
---|---|---|---|---|---|
Closest Hit | 1.664s | 0.786s (47.3%) | 0.245s (14.7%) | 0.598s (35.9%) | |
Any Hit | 0.628s | 0.251s (39.9%) | 0.243s (38.7%) | 0.117s (18.7%) | |
Shadow | 0.637s | 0.253s (39.8%) | 0.248s (38.9%) | 0.119s (18.7%) |
DXC for original HLSL (7.786s) | Entry | Total |
---|---|---|
Closest Hit | 2.584s | |
Any Hit | 2.582s | |
Shadow | 2.620s |
Speed up factors | Entry | vs. Monolithic | vs. DXC |
---|---|---|---|
Total | 1.328x | 3.530x | |
Closest Hit | 1.157x | 1.797x | |
Any Hit | 1.630x | 6.704x | |
Shadow | 1.668x | 6.858x |
Closed until perhaps further benchmarks are requested.
Time saving needs to be measured between the monolithic compilation slang-module based compilation.
The idea is for there to be at least one slang-module that’s a reusable library, for example, MDL team’s experiments defined a “libbsdf” DXIL library, and for the other slang-module to represent the material permutation, which is recompiled every time. Re-using the library module and only recompiling and linking the base module should be faster. Later in the project, the difference will again be measured when utilizing precompiled embedded DXIL/SPIRV.