shader-slang / slang

Making it easier to work with shaders
MIT License
1.99k stars 169 forks source link

[MDL:3/5] Benchmark compilation times #4661

Closed jkwak-work closed 1 month ago

jkwak-work commented 2 months ago

Time saving needs to be measured between the monolithic compilation slang-module based compilation.

The idea is for there to be at least one slang-module that’s a reusable library, for example, MDL team’s experiments defined a “libbsdf” DXIL library, and for the other slang-module to represent the material permutation, which is recompiled every time. Re-using the library module and only recompiling and linking the base module should be faster. Later in the project, the difference will again be measured when utilizing precompiled embedded DXIL/SPIRV.

jkwak-work commented 2 months ago

This is a subtask from #4582.

venkataram-nv commented 2 months ago

These are the results for SPIRV compilation (averaged over 100 compilations per step):

slangc: ..\..\build\Release\bin\slangc.exe
target: spirv
samples: 100
¡ compiled buffers.slang.
¡ compiled common.slang.
¡ compiled enumerations.slang.
¡ compiled environment.slang.
¡ compiled hit.slang.
¡ compiled material.slang.
¡ compiled mdl.slang.
¡ compiled runtime.slang.
¡ compiled shading.slang.
¡ compiled types.slang.
¡ compiled closeshit (module).
¡ compiled anyhit (module).
¡ compiled shadow (module).
¡ compiled closesthit (mono).
¡ compiled anyhit (mono).
¡ compiled shadow (mono).
results:
    module precompilation    1.553s
    module whole compilation 0.220s
        closeshit 0.122s
        anyhit    0.049s
        shadow    0.049s
    monolithic compilation   0.608s
        closeshit 0.250s
        anyhit    0.179s
        shadow    0.179s
    speed up factor          2.77x
        closeshit 2.06x
        anyhit    3.67x
        shadow    3.62x
venkataram-nv commented 2 months ago

Also please see this script to see how this benchmark is done.

jkwak-work commented 2 months ago

Judging by the script, the timing number presented is based on spCompile. But what does it include? And what doesn't it include? This seems to be the source of the confusion.

We also want to know how much DXC takes to do the same as the monolithic case.

venkataram-nv commented 2 months ago

But what does it include? And what doesn't it include?

I will check this out in more detail shortly.

We also want to know how much DXC takes to do the same as the monolithic case.

Here are the benchmarks which include DXC compilation to SPIRV.

slangc: ..\..\build\Release\bin\slangc.exe
target: spirv
samples: 16

[...]

results:

    module precompilation    1.556s

    module whole compilation 0.219s
        closeshit 0.119s
        anyhit    0.050s
        shadow    0.050s

    monolithic compilation   0.611s
        closeshit 0.252s
        anyhit    0.180s
        shadow    0.179s

    speed up (vs mono)       2.80x
        closeshit 2.12x
        anyhit    3.62x
        shadow    3.58x

    dxc compilation of hlsl  0.385s
        closeshit 0.129s
        anyhit    0.128s
        shadow    0.128s

    speed up (vs dxc)        1.76x
        closeshit 1.09x
        anyhit    2.57x
        shadow    2.56x

One caveat is that for DXC this is using the original MDL generated HLSL files, which contain a little more source code because of the macros included (some of which cannot be resolved, so they have been removed). I will include a benchmark using slang generated HLSL files as well.

venkataram-nv commented 2 months ago

Same benchmarks, but with the target now being DXIL:

slangc:  ..\..\build\Release\bin\slangc.exe
target:  dxil
samples: 64

[...]

results:

    module precompilation    1.772s

    module whole compilation 0.231s
        closeshit 0.127s
        anyhit    0.051s
        shadow    0.053s

    monolithic compilation   0.638s
        closeshit 0.263s
        anyhit    0.186s
        shadow    0.189s

    speed up (vs mono)       2.76x
        closeshit 2.07x
        anyhit    3.62x
        shadow    3.56x

    dxc for original hlsl    0.217s
        closeshit 0.073s
        anyhit    0.072s
        shadow    0.072s

    speed up (vs dxc)        0.94x
        closeshit 0.57x
        anyhit    1.41x
        shadow    1.36x
venkataram-nv commented 2 months ago

The following is a more comprehensive set of results including all the targets. The largest differentiating factor between module-based and monolithic compilation seems to be the fact that module linking does not go through semantic checking,

Additional notes:

DXIL Compilation

Module precompilation (1.592s)

Module whole compilation (0.232s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 0.128s 0.069s (54.5%) 0.000s (0.0%) 0.049s (38.2%)
Any Hit 0.052s 0.033s (64.7%) 0.000s (0.0%) 0.013s (25.9%)
Shadow 0.053s 0.034s (65.0%) 0.000s (0.0%) 0.013s (25.5%)
Monolithic compilation (0.639s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 0.262s 0.071s (27.1%) 0.126s (48.2%) 0.048s (18.5%)
Any Hit 0.189s 0.034s (18.0%) 0.130s (68.7%) 0.014s (7.2%)
Shadow 0.188s 0.034s (18.1%) 0.129s (68.5%) 0.014s (7.3%)
DXC for original HLSL (0.233s) Entry Total
Closest Hit 0.078s
Any Hit 0.077s
Shadow 0.077s
Speed up factors Entry vs. Monolithic vs. DXC
Total 2.755x 1.004x
Closest Hit 2.051x 0.612x
Any Hit 3.660x 1.499x
Shadow 3.575x 1.470x

SPIRV Compilation (Direct)

Module precompilation (1.575s)

Module whole compilation (0.223s) Entry Total Link & Optimize Semantic Checking Emit Spirv Spirv-Opt
Closest Hit 0.123s 0.087s (70.3%) 0.000s (0.0%) 0.009s (7.4%) 0.023s (18.4%)
Any Hit 0.050s 0.039s (78.8%) 0.000s (0.1%) 0.003s (5.6%) 0.005s (10.8%)
Shadow 0.050s 0.040s (79.4%) 0.000s (0.0%) 0.003s (5.5%) 0.005s (10.5%)
Monolithic compilation (0.617s) Entry Total Link & Optimize Semantic Checking Emit Spirv Spirv-Opt
Closest Hit 0.254s 0.083s (32.9%) 0.130s (51.3%) 0.008s (3.2%) 0.022s (8.8%)
Any Hit 0.184s 0.038s (20.7%) 0.129s (70.2%) 0.002s (1.3%) 0.005s (2.9%)
Shadow 0.179s 0.037s (20.8%) 0.126s (70.3%) 0.002s (1.2%) 0.005s (2.9%)
DXC for original HLSL (0.401s) Entry Total
Closest Hit 0.132s
Any Hit 0.135s
Shadow 0.134s
Speed up factors Entry vs. Monolithic vs. DXC
Total 2.768x 1.800x
Closest Hit 2.061x 1.073x
Any Hit 3.688x 2.706x
Shadow 3.598x 2.696x

SPIRV Compilation (via GLSL)

Module precompilation (1.861s)

Module whole compilation (0.364s) Entry Total Link & Optimize Semantic Checking Glslang
Closest Hit 0.176s 0.083s (47.1%) 0.000s (0.0%) 0.082s (46.9%)
Any Hit 0.095s 0.038s (39.7%) 0.000s (0.0%) 0.053s (56.4%)
Shadow 0.093s 0.037s (39.9%) 0.000s (0.0%) 0.053s (56.5%)
Monolithic compilation (0.771s) Entry Total Link & Optimize Semantic Checking Glslang
Closest Hit 0.309s 0.082s (26.7%) 0.128s (41.5%) 0.083s (26.9%)
Any Hit 0.228s 0.036s (16.0%) 0.128s (56.2%) 0.053s (23.5%)
Shadow 0.235s 0.038s (16.1%) 0.131s (56.0%) 0.055s (23.6%)
DXC for original HLSL (0.409s) Entry Total
Closest Hit 0.140s
Any Hit 0.134s
Shadow 0.135s
Speed up factors Entry vs. Monolithic vs. DXC
Total 2.120x 1.126x
Closest Hit 1.753x 0.798x
Any Hit 2.409x 1.413x
Shadow 2.518x 1.453x
venkataram-nv commented 1 month ago

New benchmarks here, now using much larger MDL shaders (specifically OmniSurface_BrushedMetal). Compile times cross the one second mark. At this point the Link & Optimize stage takes the majority of compile time.

DXIL

Module precompilation (5.971s)

Module whole compilation (2.027s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 1.368s 0.748s (54.7%) 0.000s (0.0%) 0.588s (43.0%)
Any Hit 0.335s 0.241s (71.8%) 0.000s (0.0%) 0.082s (24.5%)
Shadow 0.324s 0.232s (71.5%) 0.000s (0.0%) 0.080s (24.6%)
Monolithic compilation (2.723s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 1.535s 0.683s (44.5%) 0.247s (16.1%) 0.566s (36.9%)
Any Hit 0.600s 0.242s (40.4%) 0.256s (42.6%) 0.082s (13.6%)
Shadow 0.588s 0.238s (40.4%) 0.251s (42.6%) 0.080s (13.6%)
DXC for original HLSL (1.898s) Entry Total
Closest Hit 0.632s
Any Hit 0.638s
Shadow 0.627s
Speed up factors Entry vs. Monolithic vs. DXC
Total 1.343x 0.936x
Closest Hit 1.122x 0.462x
Any Hit 1.788x 1.903x
Shadow 1.814x 1.934x

DXIL (with validation)

Module precompilation (6.043s)

Module whole compilation (1.988s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 1.325s 0.720s (54.3%) 0.000s (0.0%) 0.574s (43.3%)
Any Hit 0.332s 0.238s (71.7%) 0.000s (0.0%) 0.081s (24.5%)
Shadow 0.331s 0.237s (71.4%) 0.000s (0.0%) 0.082s (24.8%)
Monolithic compilation (2.814s) Entry Total Link & Optimize Semantic Checking DXC Downstream
Closest Hit 1.623s 0.745s (45.9%) 0.255s (15.7%) 0.584s (36.0%)
Any Hit 0.595s 0.239s (40.1%) 0.255s (42.8%) 0.082s (13.8%)
Shadow 0.596s 0.240s (40.3%) 0.255s (42.8%) 0.081s (13.7%)
DXC for original HLSL (2.117s) Entry Total
Closest Hit 0.708s
Any Hit 0.707s
Shadow 0.702s
Speed up factors Entry vs. Monolithic vs. DXC
Total 1.415x 1.065x
Closest Hit 1.225x 0.535x
Any Hit 1.791x 2.128x
Shadow 1.798x 2.120x

SPIRV (Directly)

Module precompilation (6.451s)

Module whole compilation (2.208s) Entry Total Link & Optimize Semantic Checking Emit Spirv Spirv-Opt
Closest Hit 1.487s 0.863s (58.0%) 0.000s (0.0%) 0.192s (12.9%) 0.424s (28.5%)
Any Hit 0.360s 0.249s (69.2%) 0.000s (0.0%) 0.038s (10.4%) 0.070s (19.3%)
Shadow 0.361s 0.250s (69.4%) 0.000s (0.0%) 0.038s (10.4%) 0.069s (19.1%)
Monolithic compilation (2.998s) Entry Total Link & Optimize Semantic Checking Emit Spirv Spirv-Opt
Closest Hit 1.743s 0.863s (49.5%) 0.247s (14.2%) 0.194s (11.1%) 0.424s (24.3%)
Any Hit 0.626s 0.258s (41.2%) 0.248s (39.5%) 0.039s (6.2%) 0.071s (11.3%)
Shadow 0.628s 0.258s (41.0%) 0.250s (39.8%) 0.039s (6.2%) 0.071s (11.3%)
DXC for original HLSL (7.804s) Entry Total
Closest Hit 2.611s
Any Hit 2.610s
Shadow 2.583s
Speed up factors Entry vs. Monolithic vs. DXC
Total 1.358x 3.534x
Closest Hit 1.172x 1.756x
Any Hit 1.738x 7.241x
Shadow 1.742x 7.163x

SPIRV (via GLSL)

Module precompilation (6.363s)

Module whole compilation (2.205s) Entry Total Link & Optimize Semantic Checking Glslang
Closest Hit 1.438s 0.800s (55.6%) 0.000s (0.0%) 0.610s (42.4%)
Any Hit 0.385s 0.254s (66.0%) 0.000s (0.0%) 0.121s (31.4%)
Shadow 0.382s 0.253s (66.1%) 0.000s (0.0%) 0.119s (31.3%)
Monolithic compilation (2.929s) Entry Total Link & Optimize Semantic Checking Glslang
Closest Hit 1.664s 0.786s (47.3%) 0.245s (14.7%) 0.598s (35.9%)
Any Hit 0.628s 0.251s (39.9%) 0.243s (38.7%) 0.117s (18.7%)
Shadow 0.637s 0.253s (39.8%) 0.248s (38.9%) 0.119s (18.7%)
DXC for original HLSL (7.786s) Entry Total
Closest Hit 2.584s
Any Hit 2.582s
Shadow 2.620s
Speed up factors Entry vs. Monolithic vs. DXC
Total 1.328x 3.530x
Closest Hit 1.157x 1.797x
Any Hit 1.630x 6.704x
Shadow 1.668x 6.858x
venkataram-nv commented 1 month ago

Closed until perhaps further benchmarks are requested.