microsoft / OpenCLOn12

The OpenCL-on-D3D12 mapping layer
MIT License
104 stars 13 forks source link

pdb for OpenCLOn12Compiler.dll ? #40

Open vlj opened 1 year ago

vlj commented 1 year ago

Hi,

On my app I have a crash in the d3d background thread ; unfortunately I'm not able to dump the callstack to understand what's happening. What I see is that the issue happen in the OpenCLOn12Compiler.dll when calling a cl::finish(). I would like to have a .pdb so I can have an idea of what's happening.

I don't know how to build this DLL.

MathiasMagnus commented 1 year ago

@jenatali Could someone from your team expose a 6-7-liner shell script which builds the runtime along with the compiler? It doesn't have to cover every config, bitness, whatever... it would help us help you solve some of the issues in the runtime if we were able to build the runtime ourselves. It's nice to have "Good first issue" tags on issues, but if I can't test that my changes actually work, we can't really engage.

jenatali commented 1 year ago

Building the runtime is pretty easy: clone the repo, run CMake, it'll download the deps via FetchContent, and then build. Building the compiler is a nightmare because it requires locally-built LLVM, Clang, libclc, SPIRV-Tools, SPIRV-LLVM-Translator, plus Meson, Mako, and pkg-config.

I'll see about getting PDBs published for future versions of this package. If you can share the version of the package that you're using, I can probably upload the PDBs to this issue for you to use.

MathiasMagnus commented 1 year ago

Locally built LLVM, Clang, SPIRV-Tools, SPIRV-LLVM-Translator isn't an issue, I already have all of those (for other projects). Meson Mako I already have. I was trying to get Meson to find libclc built using CMake, but that may be my problem. I'd publish that nightmare in a Gist here for everyone's convenience, I'm just missing the last step.

vlj commented 1 year ago

I took release 1.2112.2.0 (the latest one available).

jenatali commented 1 year ago

Locally built LLVM, Clang, SPIRV-Tools, SPIRV-LLVM-Translator isn't an issue, I already have all of those (for other projects). Meson Mako I already have. I was trying to get Meson to find libclc built using CMake, but that may be my problem. I'd publish that nightmare in a Gist here for everyone's convenience, I'm just missing the last step.

I believe libclc can only be found via pkg-config. Once it's installed, you can point meson's pkg-config path to $(libclc-install-path)/share/pkgconfig. Example from https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/.gitlab-ci/windows/mesa_build.ps1#L46.

@vlj looks like I'll need to upload a new release, it seems our symbols have aged out of our internal symbol store and I don't seem to have a cached copy locally. Sorry about that!

vlj commented 1 year ago

By putting breakpoints in program_v2.cpp I managed to see that CompilerV2::Instance()->ParseSpirv(&m_Object, logger ? &logger_impl : nullptr, &m_Parsed) seems to not work properly, or the SPIRV object stored in m_Object is wrong ; I see that the first kernel stored in m_Parsed.kernels has correct name/type/args... but the second entry in the array contains garbage as well as the third one.

This function seem to be provided by the clon12 compiler dll I took from the release.

Is there a way to "dump" the spirv object on disk to be able to disassemble it via spirv tool ?

jenatali commented 1 year ago

Is there a way to "dump" the spirv object on disk to be able to disassemble it via spirv tool ?

I presume this debugging was done with Visual Studio. I don't think there's anything built-in to VS to do that, though there might be extensions? I also quickly found this StackOverflow page with suggestions: https://stackoverflow.com/questions/4155624/save-data-from-visual-studio-memory-window. Of note on that page is the suggestion to use WinDbg's .writemem command, which is exactly the method that I would normally use to debug this kind of issue from a crash dump.

vlj commented 1 year ago

thanks for the .writemem tip, will have a look at it. In the meantime I discovered that there is a padding in the clc_kernel_info. Are clc include in sync ?

vlj commented 1 year ago

@jenatali From what I see it looks like m_Parsed.num_kernels doesn't report the correct number of kernel ; by dumping the actual SPIRV generated from my app I see 86 kernel, but only the first 43 ones are valid, and m_Parsed.kernels[44], m_Parsed.kernels[45]... are garbage. By comparing the name of the kernels in m_Parsed.kernels and the one in the SPIRV I see that some are missing in m_Parsed.kernels ; my intuition is that some kernels were optimized away (the code from my app is quite old and not very well maintained, I wouldn't be surprised if I have left-over). I'm trying to investigate further to be sure.

jenatali commented 1 year ago

That seems odd, but that's also a lot more kernels than I've seen in a single program while debugging before, so it's possible there's an issue with large kernel counts.

vlj commented 1 year ago

Actually it's strange, when I look at the spirv dump and what get actually parsed, I get kernel 1, kernel 3, kernel 5... it's like the odd kernel are dropped for whatever reasons.

vlj commented 1 year ago

I sent you a mail with a repro binary. I can't really debug this since I didn't build openclon12compiler.dll, I don't know if I did something wrong in my setup.