shader-slang / slang

Making it easier to work with shaders
MIT License
1.96k stars 166 forks source link

MDL module compilation with Slang #4582

Open jkwak-work opened 1 month ago

jkwak-work commented 1 month ago

A design document for this task is written by @cheneym2 .

This issue is to track the progress of the task. In short, we want to compile MDL with Slang and see how much the compile time can be saved with the "module feature" of Slang.

The goal is to see the compile time difference. We also like to see the runtime difference, but the scope of this task wouldn't extend to improve the performance even when it shows slower runtime.

The proposed approach consists of five subtasks.

  1. Converting complex shaders from HLSL to Slang
  2. Utilize existing slang-module capabilities in compilation
  3. Benchmark compilation times
  4. Write a sample renderer test case that utilize the above complex shader modules in a simple rendering application, which verifies correctness and runtime rendering performance
  5. Add support for the embedded precompiled slang-module feature, measuring both compilation and runtime performance differences to the base slang-module case.

For the step 1, we need to figure out how to generate HLSL from MDL system and asses the complexity of the shader. If the complexity is not as high as we expected, we may need to reevaluate the approach of the task.

During the step 1, we may find a few bugs that Slang needs to iron out. If the generated HLSL from MDL doesn't directly work with Slang, we will need to create new issues and address them properly rather than adding hacks and workaround.

For the step 4, we may be able to reuse an internal unit testing framework that MDL team might be using.

During the step 4, we will also need to evaluate the possibility of adopting the MDL testing framework in our CI like how we do for Falcor.

jkwak-work commented 1 month ago

I am setting the estimate to 6 weeks (30 days). Each sub task will roughly get 1 week. @venkataram-nv should feel free to adjust as needed.

If this is not good for task tracking, we can create new issues for each sub task. But github doesn't have ways to treat issues as subtasks and it may add confusion.

jkwak-work commented 1 month ago

Venki found that there is an open source version of MDL SDK at github.

I found the followings after building and running the examples

I noticed that there are two examples trying to demonstrate "module". I am not sure what kind of module it is about but it will be worth checking out.

When I built "RUN_TEST" project from MDL.sln, it runs unit tests. I see that it runs 56 unit tests when I tried, which took about 1 minute 23 seconds on my machine.

The install instruction was not easy to follow, and I like to leave a few notes about it.

  1. I had to use an exact commit ID for vcpkg repo, "fe1e9f5".
  2. I had to add a line "set(VCPKG_PLATFORM_TOOLSET v143)" to a file, "triplets/x64-windows-static.cmake", in vcpkg repo.
  3. I set an environment variable, set VCPKG_ROOT="C:\path\to\vcpkg, and I also added the same path to Path variable.
  4. I installed boost and openimageio with the following commands,
    vcpkg install --triplet=x64-windows-static boost-any
    vcpkg install --triplet=x64-windows-static boost-uuid
    vcpkg install --triplet=x64-windows-static openimageio[gif,openjpeg,tools,webp]
  5. I install a latest Python 3.12.4 from python.org
  6. I installed LLVM 12.0.1 from llvm.org
  7. I downloaded DXC from its release page

When I got to the point where I can run cmake-gui, I also had to deal with errors.

  1. When I clicked on "Configure", I got an error about CMAKE_TOOLCHAIN_FILE. I had to set it to be "C:\path\to\vcpkg\scripts/buildsystems/vcpkg.cmake" inside of the cmake-gui UI.
  2. When I clicked on "Configure" again, it gave me errors again. I had to uncheck the following check boxes,
    MDL_BUILD_DOCUMENTATION
    MDL_ENABLE_CUDA_EXAMPLES
    MDL_ENABLE_OPENGL_EXAMPLES
    MDL_ENABLE_VULKAN_EXAMPLES
    MDL_ENABLE_QT_EXAMPLES
  3. I had to set DXC_DIR to where the downloaded DXC files are unzipped at.
jkwak-work commented 1 month ago

I am uploading a screenshot from "examples/mdl_sdk/dxr/Release/dxr.exe" as a quick reference. mdl_dxr_01

jkwak-work commented 1 month ago

Here is a unit test result as another quick reference. I am not sure why one of the tests is failing,

16:59:35:652    1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
16:59:36:337    1>1>
16:59:36:734    1>Test project D:/sbf/git/mdl/MDL-SDK/build/vs2022
16:59:36:908    1>      Start  1: prod-test-base-test
16:59:36:979    1> 1/56 Test  #1: prod-test-base-test ............................   Passed    0.03 sec
16:59:36:979    1>      Start  2: prod-test-math-test
16:59:37:111    1> 2/56 Test  #2: prod-test-math-test ............................   Passed    0.17 sec
16:59:37:111    1>      Start  3: base-data-attr-test
16:59:37:175    1> 3/56 Test  #3: base-data-attr-test ............................   Passed    0.05 sec
16:59:37:175    1>      Start  4: base-data-dblight-test
16:59:37:490    1> 4/56 Test  #4: base-data-dblight-test .........................   Passed    0.32 sec
16:59:37:552    1>      Start  5: base-data-serial-test
16:59:37:552    1> 5/56 Test  #5: base-data-serial-test ..........................   Passed    0.05 sec
16:59:37:552    1>      Start  6: base-data-thread_pool-test
16:59:49:312    1> 6/56 Test  #6: base-data-thread_pool-test .....................   Passed   11.77 sec
16:59:49:312    1>      Start  7: base-hal-disk-test
16:59:49:411    1> 7/56 Test  #7: base-hal-disk-test .............................   Passed    0.10 sec
16:59:49:470    1>      Start  8: base-hal-hal-test
16:59:49:470    1> 8/56 Test  #8: base-hal-hal-test ..............................   Passed    0.04 sec
16:59:49:470    1>      Start  9: base-hal-thread-test
16:59:49:530    1> 9/56 Test  #9: base-hal-thread-test ...........................   Passed    0.02 sec
16:59:49:530    1>      Start 10: base-hal-time-test
16:59:53:316    1>10/56 Test #10: base-hal-time-test .............................   Passed    3.84 sec
16:59:53:385    1>      Start 11: base-lib-config-test
16:59:53:385    1>11/56 Test #11: base-lib-config-test ...........................   Passed    0.05 sec
16:59:53:386    1>      Start 12: base-lib-cont-test
16:59:53:445    1>12/56 Test #12: base-lib-cont-test .............................   Passed    0.03 sec
16:59:53:445    1>      Start 13: base-lib-mem-test
16:59:53:445    1>13/56 Test #13: base-lib-mem-test ..............................   Passed    0.03 sec
16:59:53:445    1>      Start 14: base-lib-path-test
16:59:53:506    1>14/56 Test #14: base-lib-path-test .............................   Passed    0.03 sec
16:59:53:506    1>      Start 15: base-util-string_utils-test
16:59:53:506    1>15/56 Test #15: base-util-string_utils-test ....................   Passed    0.03 sec
16:59:53:506    1>      Start 16: base-system-main-test
16:59:53:566    1>16/56 Test #16: base-system-main-test ..........................   Passed    0.04 sec
16:59:53:566    1>      Start 17: base-system-main-test_impl1_only
16:59:53:627    1>17/56 Test #17: base-system-main-test_impl1_only ...............   Passed    0.03 sec
16:59:53:627    1>      Start 18: base-system-stlext-test
16:59:53:627    1>18/56 Test #18: base-system-stlext-test ........................   Passed    0.04 sec
16:59:53:627    1>      Start 19: base-system-test-test_self
16:59:53:686    1>19/56 Test #19: base-system-test-test_self .....................   Passed    0.04 sec
16:59:53:686    1>      Start 20: io-image-image-test_access_canvas
16:59:54:362    1>20/56 Test #20: io-image-image-test_access_canvas ..............   Passed    0.70 sec
16:59:54:425    1>      Start 21: io-image-image-test_access_mipmap
16:59:54:813    1>21/56 Test #21: io-image-image-test_access_mipmap ..............   Passed    0.45 sec
16:59:54:878    1>      Start 22: io-image-image-test_dds
16:59:55:684    1>22/56 Test #22: io-image-image-test_dds ........................   Passed    0.87 sec
16:59:55:738    1>      Start 23: io-image-image-test_huge_tiles
17:00:00:406    1>23/56 Test #23: io-image-image-test_huge_tiles .................   Passed    4.72 sec
17:00:00:470    1>      Start 24: io-image-image-test_import_export
17:00:34:163    1>24/56 Test #24: io-image-image-test_import_export ..............   Passed   33.75 sec
17:00:34:227    1>      Start 25: io-image-image-test_mipmap
17:00:34:979    1>25/56 Test #25: io-image-image-test_mipmap .....................   Passed    0.81 sec
17:00:34:979    1>      Start 26: io-image-image-test_module
17:00:35:397    1>26/56 Test #26: io-image-image-test_module .....................Exit code 0xc0000409
17:00:35:397    1>***Exception:   0.41 sec
17:00:35:397    1>      Start 27: io-image-image-test_pixel_conversion
17:00:35:973    1>27/56 Test #27: io-image-image-test_pixel_conversion ...........   Passed    0.58 sec
17:00:36:035    1>      Start 28: io-image-image-test_pixel_conversion_sse
17:00:36:765    1>28/56 Test #28: io-image-image-test_pixel_conversion_sse .......   Passed    0.79 sec
17:00:36:835    1>      Start 29: io-image-image-test_quantization
17:00:36:835    1>29/56 Test #29: io-image-image-test_quantization ...............   Passed    0.04 sec
17:00:36:835    1>      Start 30: io-scene-bsdf_measurements-test
17:00:37:120    1>30/56 Test #30: io-scene-bsdf_measurements-test ................   Passed    0.31 sec
17:00:37:120    1>      Start 31: io-scene-mdl_elements-test_types
17:00:37:407    1>31/56 Test #31: io-scene-mdl_elements-test_types ...............   Passed    0.28 sec
17:00:37:407    1>      Start 32: io-scene-mdl_elements-test_values
17:00:37:701    1>32/56 Test #32: io-scene-mdl_elements-test_values ..............   Passed    0.29 sec
17:00:37:754    1>      Start 33: io-scene-mdl_elements-test_expressions
17:00:37:996    1>33/56 Test #33: io-scene-mdl_elements-test_expressions .........   Passed    0.29 sec
17:00:38:055    1>      Start 34: io-scene-mdl_elements-test_misc
17:00:38:551    1>34/56 Test #34: io-scene-mdl_elements-test_misc ................   Passed    0.55 sec
17:00:38:613    1>      Start 35: io-scene-dbimage-test
17:00:39:869    1>35/56 Test #35: io-scene-dbimage-test ..........................   Passed    1.26 sec
17:00:39:869    1>      Start 36: prod-lib-mdl_sdk-test_class_factory
17:00:40:307    1>36/56 Test #36: prod-lib-mdl_sdk-test_class_factory ............   Passed    0.49 sec
17:00:40:307    1>      Start 37: prod-lib-mdl_sdk-test_db_elements
17:00:40:689    1>37/56 Test #37: prod-lib-mdl_sdk-test_db_elements ..............   Passed    0.38 sec
17:00:40:754    1>      Start 38: prod-lib-mdl_sdk-test_idebug_configuration
17:00:41:099    1>38/56 Test #38: prod-lib-mdl_sdk-test_idebug_configuration .....   Passed    0.36 sec
17:00:41:099    1>      Start 39: prod-lib-mdl_sdk-test_ifactory
17:00:41:464    1>39/56 Test #39: prod-lib-mdl_sdk-test_ifactory .................   Passed    0.41 sec
17:00:41:522    1>      Start 40: prod-lib-mdl_sdk-test_i18n
17:00:42:164    1>40/56 Test #40: prod-lib-mdl_sdk-test_i18n .....................   Passed    0.70 sec
17:00:42:218    1>      Start 41: prod-lib-mdl_sdk-test_iimage
17:00:42:581    1>41/56 Test #41: prod-lib-mdl_sdk-test_iimage ...................   Passed    0.41 sec
17:00:42:581    1>      Start 42: prod-lib-mdl_sdk-test_ilogging_configuration
17:00:43:013    1>42/56 Test #42: prod-lib-mdl_sdk-test_ilogging_configuration ...   Passed    0.39 sec
17:00:43:013    1>      Start 43: prod-lib-mdl_sdk-test_imdl_configuration
17:00:43:278    1>43/56 Test #43: prod-lib-mdl_sdk-test_imdl_configuration .......   Passed    0.30 sec
17:00:43:333    1>      Start 44: prod-lib-mdl_sdk-test_imdl_module
17:00:46:139    1>44/56 Test #44: prod-lib-mdl_sdk-test_imdl_module ..............   Passed    2.86 sec
17:00:46:195    1>      Start 45: prod-lib-mdl_sdk-test_ineuray
17:00:46:580    1>45/56 Test #45: prod-lib-mdl_sdk-test_ineuray ..................   Passed    0.38 sec
17:00:46:580    1>      Start 46: prod-lib-mdl_sdk-test_itransaction
17:00:46:911    1>46/56 Test #46: prod-lib-mdl_sdk-test_itransaction .............   Passed    0.39 sec
17:00:46:978    1>      Start 47: prod-lib-mdl_sdk-test_neuray_factory
17:00:47:624    1>47/56 Test #47: prod-lib-mdl_sdk-test_neuray_factory ...........   Passed    0.71 sec
17:00:47:690    1>      Start 48: prod-lib-mdl_sdk-test_set_get
17:00:48:018    1>48/56 Test #48: prod-lib-mdl_sdk-test_set_get ..................   Passed    0.39 sec
17:00:48:080    1>      Start 49: prod-lib-mdl_sdk-test_target_materials
17:00:48:413    1>49/56 Test #49: prod-lib-mdl_sdk-test_target_materials .........   Passed    0.39 sec
17:00:48:472    1>      Start 50: prod-lib-mdl_sdk-test_types
17:00:48:821    1>50/56 Test #50: prod-lib-mdl_sdk-test_types ....................   Passed    0.41 sec
17:00:48:878    1>      Start 51: prod-lib-mdl_sdk-test_types_array
17:00:49:202    1>51/56 Test #51: prod-lib-mdl_sdk-test_types_array ..............   Passed    0.38 sec
17:00:49:268    1>      Start 52: prod-lib-mdl_sdk-test_types_compound
17:00:49:669    1>52/56 Test #52: prod-lib-mdl_sdk-test_types_compound ...........   Passed    0.40 sec
17:00:49:669    1>      Start 53: prod-lib-mdl_sdk-test_types_map
17:00:50:001    1>53/56 Test #53: prod-lib-mdl_sdk-test_types_map ................   Passed    0.40 sec
17:00:50:001    1>      Start 54: prod-lib-mdl_sdk-test_types_at_api_boundary
17:00:50:336    1>54/56 Test #54: prod-lib-mdl_sdk-test_types_at_api_boundary ....   Passed    0.33 sec
17:00:50:336    1>      Start 55: prod-lib-mdl_sdk-test_unique_IIDs
17:00:51:127    1>55/56 Test #55: prod-lib-mdl_sdk-test_unique_IIDs ..............   Passed    0.79 sec
17:00:51:127    1>      Start 56: prod-bin-mdltlc-test_basics
17:00:58:010    1>56/56 Test #56: prod-bin-mdltlc-test_basics ....................   Passed    6.88 sec
17:00:58:065    1>
17:00:58:065    1>98% tests passed, 1 tests failed out of 56
17:00:58:065    1>
17:00:58:066    1>Label Time Summary:
17:00:58:066    1>unit_test    =  80.98 sec*proc (56 tests)
17:00:58:066    1>
17:00:58:066    1>Total Test time (real) =  81.29 sec
17:00:58:066    1>
17:00:58:066    1>The following tests FAILED:
17:00:58:066    1>   26 - io-image-image-test_module (Exit code 0xc0000409
17:00:58:066    1>)
jkwak-work commented 1 month ago

I am attaching a generated HLSL from "examples/mdl_sdk/code_gen/Release/code_gen.exe" as another quick reference. code_gen.hlsl.txt

jkwak-work commented 1 month ago

I found a more formal document about what MDL is. https://github.com/NVIDIA/MDL-SDK/blob/master/doc/specification/MDL_spec_1.8.2_24May2023.pdf

venkataram-nv commented 1 month ago

After taking a closer at the DXR example, it seems that the code_gen output is far from enough to get a compiled final output. In the DXR program they generate a lot of code on the fly which is plugged into the shader source code. Perhaps we can use the DXR program as a base for the rendering test, since we may need to hijack it to use slang code from generation itself to get a full set of compatible slang shaders.

venkataram-nv commented 1 month ago

Got a set of shaders to compile with slang with no modification whatsover (e.g. compiling as HLSL rather than slang). Next step is to make use of slang modules (rather than the current "#include" directives) to establish separate sources of shared code.

venkataram-nv commented 1 month ago

Now I have manged to slangify the initial HLSL code so that it uses slang modules (via import declarations). The files can be seen here. There may be a question of whether this is sufficiently complex, since the compile time seems (at least at a glance) to be quite fast already. We could also generate multiple materials (there are a few files which are common to all) and benchmark separate compilation.

venkataram-nv commented 1 month ago

After rearranging and separating the source code a little bit, I have gotten a version of shaders where I can separately compile each into individual modules and then compile them all into a single unit (see the compiled-modules.sh file). I also did some very coarse timings on slangc and there indeed seems to be an improvement in using modules over monolothic sources:

compiling altogether:

modules/types.slang-module
modules/shading.slang-module
modules/runtime.slang-module
modules/mdl.slang-module
modules/material.slang-module
modules/hit.slang-module
modules/environment.slang-module
modules/enumerations.slang-module
modules/common.slang-module
modules/buffers.slang-module

real    0m0.202s
user    0m0.172s
sys     0m0.030s

monolithic compilation:

real    0m0.297s
user    0m0.286s
sys     0m0.010s

The compilation times for each individual unit are not shown here.

venkataram-nv commented 1 month ago

Got some updated compile time timings by adding some modifications to the slangc related functions. Each time slangc is called there is an additional ~0.1 seconds needed to initialize the global state of the compiler. including loading the std lib functions which takes the longest (99%). In practice this would be a one-time/start-up cost, so we choose to ignore it here. The main compile function is spCompile in the timings shown at the end.

An additional modification is that the hit.slang shader is compiled into separate modules for each stage.

The timings for each RTX shader stage are shown in the table below:

Modules Monolithic Speed up
Closest hit 0.140s 0.290s 2.07x
Any hit 0.053s 0.201s 3.79x
Shadow hit 0.052s 0.196s 3.76x

The complete output (with all stages and compiles) is as follows:

===========================
file: buffers.slang
===========================
                             create_global_session: 0.119s
                                         spCompile: 0.060s
                                       slangc_main: 0.195s

===========================
file: common.slang
===========================
                             create_global_session: 0.105s
                                         spCompile: 0.032s
                                       slangc_main: 0.152s

===========================
file: enumerations.slang
===========================
                             create_global_session: 0.106s
                                         spCompile: 0.006s
                                       slangc_main: 0.128s

===========================
file: environment.slang
===========================
                             create_global_session: 0.105s
                                         spCompile: 0.045s
                                       slangc_main: 0.165s

===========================
file: hit.slang
===========================

Closest hit shader:
                             create_global_session: 0.120s
                                         spCompile: 0.190s
                                       slangc_main: 0.330s
Any hit shader:
                             create_global_session: 0.113s
                                         spCompile: 0.150s
                                       slangc_main: 0.284s
Shadow hit shader:
                             create_global_session: 0.116s
                                         spCompile: 0.151s
                                       slangc_main: 0.286s

===========================
file: material.slang
===========================
                             create_global_session: 0.112s
                                         spCompile: 0.110s
                                       slangc_main: 0.248s

===========================
file: mdl.slang
===========================
                             create_global_session: 0.111s
                                         spCompile: 0.015s
                                       slangc_main: 0.144s

===========================
file: runtime.slang
===========================
                             create_global_session: 0.112s
                                         spCompile: 0.108s
                                       slangc_main: 0.241s

===========================
file: shading.slang
===========================
                             create_global_session: 0.117s
                                         spCompile: 0.002s
                                       slangc_main: 0.134s

===========================
file: types.slang
===========================
                             create_global_session: 0.111s
                                         spCompile: 0.007s
                                       slangc_main: 0.134s

===========================
compiling altogether:
===========================

modules/types.slang-module
                modules/shading.slang-module
                modules/runtime.slang-module
                modules/mdl.slang-module
                modules/material.slang-module
                modules/closesthit.slang-module
                modules/anyhit.slang-module
                modules/shadow.slang-module
                modules/environment.slang-module
                modules/enumerations.slang-module
                modules/common.slang-module
                modules/buffers.slang-module

Closest hit shader:
                             create_global_session: 0.117s
                                         spCompile: 0.140s
                                       slangc_main: 0.299s
Any hit shader:
                             create_global_session: 0.114s
                                         spCompile: 0.053s
                                       slangc_main: 0.210s
Shadow hit shader:
                             create_global_session: 0.110s
                                         spCompile: 0.052s
                                       slangc_main: 0.213s

===========================
monolithic compilation:
===========================

Closest hit shader:
                             create_global_session: 0.104s
                                         spCompile: 0.290s
                                       slangc_main: 0.415s
Any hit shader:
                             create_global_session: 0.117s
                                         spCompile: 0.201s
                                       slangc_main: 0.338s
Shadow hit shader:
                             create_global_session: 0.117s
                                         spCompile: 0.196s
                                       slangc_main: 0.335s

For verification purposes, here are the storage sizes for each module:

[4.0K]  modules
├── [123K]  anyhit.slang-module
├── [ 33K]  buffers.slang-module
├── [123K]  closesthit.slang-module
├── [ 70K]  common.slang-module
├── [111K]  enumerations.slang-module
├── [ 50K]  environment.slang-module
├── [ 98K]  material.slang-module
├── [ 48K]  mdl.slang-module
├── [461K]  runtime.slang-module
├── [ 11K]  shading.slang-module
├── [123K]  shadow.slang-module
└── [ 46K]  types.slang-module

Considering that the *hit.slang-module sizes are similar, it is likely that hit.slang can be separated into at least another module.

jkwak-work commented 1 month ago

I think the result looks very promising.

If I understood the comment above correctly, a big portion of the compile time is for loading stdlib. Can you figure out if it is currently possible to load stdlib only once and execute compilations for multiple outputs? If not, we need to file an issue to make it possible.

csyonghe commented 1 month ago

I think we can at least make everything in the benchmark app using the same IGlobalSession, to get rid of that global session creation time.