libmir / dcompute

DCompute: Native execution of D on GPUs and other Accelerators
Boost Software License 1.0
138 stars 27 forks source link

the library does not compile on Windows with LDC 1.30.0 #72

Closed aferust closed 1 year ago

aferust commented 1 year ago

Windows 10 x86_64

with dcompute dowloaded from the master repo.

LDC version output lists nvptx and nvptx64 :

LDC - the LLVM D compiler (1.30.0):
  based on DMD v2.100.1 and LLVM 14.0.3
  built with LDC - the LLVM D compiler (1.30.0)
  Default target: x86_64-pc-windows-msvc
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    avr        - Atmel AVR Microcontroller
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    hexagon    - Hexagon
    lanai      - Lanai
    mips       - MIPS (32-bit big endian)
    mips64     - MIPS (64-bit big endian)
    mips64el   - MIPS (64-bit little endian)
    mipsel     - MIPS (32-bit little endian)
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc32le    - PowerPC 32 LE
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    r600       - AMD GPUs HD2XXX-HD6XXX
    riscv32    - 32-bit RISC-V
    riscv64    - 64-bit RISC-V
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    ve         - VE
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore

Error output when dub build:

C:\Users\user\AppData\Local\dub\packages\dcompute>dub build
Performing "debug" build using D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe for x86_64.
dcompute ~master: building configuration "library"...
 #0 0x00007ff630b3f541 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0xaf541)
 #1 0x00007ff6335c7596 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b37596)
 #2 0x00007ff6335c64e9 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b364e9)
 #3 0x00007ff6335583ed (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2ac83ed)
 #4 0x00007ff6335a405b (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b1405b)
 #5 0x00007ff6335b1086 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b21086)
 #6 0x00007ff6335a1454 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b11454)
 #7 0x00007ff633580528 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2af0528)
 #8 0x00007ff633585131 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2af5131)
 #9 0x00007ff63358024b (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2af024b)
#10 0x00007ff63358024b (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2af024b)
#11 0x00007ff6335ce7f8 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2b3e7f8)
#12 0x00007ff6338b7fe4 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2e27fe4)
#13 0x00007ff633575dcd (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2ae5dcd)
#14 0x00007ff633685749 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2bf5749)
#15 0x00007ff633685379 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2bf5379)
#16 0x00007ff6336856a4 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2bf56a4)
#17 0x00007ff63356fa9f (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2adfa9f)
#18 0x00007ff6338c9dc4 (D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe+0x2e39dc4)
#19 0x00007ff8061d7034 (C:\WINDOWS\System32\KERNEL32.DLL+0x17034)
#20 0x00007ff8075826a1 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x526a1)
D:\dlang\ldc2-1.30.0-windows-multilib\bin\ldc2.exe failed with exit code -2147483645.

and the same output for release build

aferust commented 1 year ago

I wonder if those are related.

https://github.com/ldc-developers/ldc/issues/4260

MrcSnm commented 1 year ago

Why don't you try running the assertion enabled LDC on yours too?

aferust commented 1 year ago

Why don't you try running the assertion enabled LDC on yours too?

where are those debug builds? Could you please share dload links?

MrcSnm commented 1 year ago

For bleeding-edge users, we also provide the [latest successful Continuous Integration builds](https://github.com/ldc-developers/ldc/releases/tag/CI) with enabled LLVM & LDC assertions (increasing compile times by roughly 50%).

Those are on the README.md from ldc repo. here's the link https://github.com/ldc-developers/ldc/releases/tag/CI

aferust commented 1 year ago

For bleeding-edge users, we also provide the [latest successful Continuous Integration builds](https://github.com/ldc-developers/ldc/releases/tag/CI) with enabled LLVM & LDC assertions (increasing compile times by roughly 50%).

Those are on the README.md from ldc repo. here's the link https://github.com/ldc-developers/ldc/releases/tag/CI

gotcha thank you

aferust commented 1 year ago

LDC assertions complain about:

expected FuncDeclaration overload to be used
UNREACHABLE executed at D:\a\ldc\ldc\gen\abi\nvptx.cpp:20!
thewilsonator commented 1 year ago

Thanks for finding the assert that failed, can you post the stack trace as well?

aferust commented 1 year ago

Probably this doesn't help much. I used windbg with dub's generated ldc2 command. There should be some exception analysis section somewhere.

Microsoft (R) Windows Debugger Version 10.0.25200.1003 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: D:\dlang\ldc2-96b5cfb3-windows-multilib\bin\ldc2.exe -mdcompute-targets=cuda-210 -oq -lib -of.dub\build\library-debug-windows-x86_64-ldc_v1.30.0-644701512F8AF52E3CC1634C5B27247E\dcompute.lib -d-debug -g -w --oq -od=.dub\build\library-debug-windows-x86_64-ldc_v1.30.0-644701512F8AF52E3CC1634C5B27247E\/obj -d-version=Have_dcompute -d-version=Have_derelict_cl -d-version=Have_derelict_cuda -d-version=Have_taggedalgebraic -d-version=Have_derelict_util -Isource -IC:\Users\user\AppData\Local\dub\packages\derelict-cl-3.2.0\derelict-cl\source -IC:\Users\user\AppData\Local\dub\packages\derelict-util-3.0.0-beta.2\derelict-util\source -IC:\Users\user\AppData\Local\dub\packages\derelict-cuda-3.1.1\derelict-cuda\source -IC:\Users\user\AppData\Local\dub\packages\taggedalgebraic-0.10.13\taggedalgebraic\source source\dcompute\driver\backend.d source\dcompute\driver\cuda\buffer.d source\dcompute\driver\cuda\context.d source\dcompute\driver\cuda\device.d source\dcompute\driver\cuda\event.d source\dcompute\driver\cuda\kernel.d source\dcompute\driver\cuda\memory.d source\dcompute\driver\cuda\package.d source\dcompute\driver\cuda\platform.d source\dcompute\driver\cuda\program.d source\dcompute\driver\cuda\queue.d source\dcompute\driver\error.d source\dcompute\driver\ocl\buffer.d source\dcompute\driver\ocl\context.d source\dcompute\driver\ocl\device.d source\dcompute\driver\ocl\event.d source\dcompute\driver\ocl\image.d source\dcompute\driver\ocl\kernel.d source\dcompute\driver\ocl\memory.d source\dcompute\driver\ocl\package.d source\dcompute\driver\ocl\platform.d source\dcompute\driver\ocl\program.d source\dcompute\driver\ocl\queue.d source\dcompute\driver\ocl\raw\enums.d source\dcompute\driver\ocl\raw\functions.d source\dcompute\driver\ocl\raw\package.d source\dcompute\driver\ocl\sampler.d source\dcompute\driver\ocl\util.d source\dcompute\driver\util.d source\dcompute\kernels\package.d source\dcompute\std\atomic.d source\dcompute\std\cuda\index.d source\dcompute\std\cuda\sync.d source\dcompute\std\floating.d source\dcompute\std\index.d source\dcompute\std\integer.d source\dcompute\std\memory.d source\dcompute\std\opencl\image.d source\dcompute\std\opencl\index.d source\dcompute\std\opencl\sync.d source\dcompute\std\pack.d source\dcompute\std\package.d source\dcompute\std\sync.d source\dcompute\std\warp.d source\dcompute\tests\dummykernels.d source\dcompute\tests\main.d source\dcompute\tests\test.d -vcolumns
Starting directory: D:\projects\d_projects\dub\packages\dcompute

************* Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
OK                                             D:\dlang\ldc2-96b5cfb3-windows-multilib\bin
Symbol search path is: srv*;D:\dlang\ldc2-96b5cfb3-windows-multilib\bin
Executable search path is: 
ModLoad: 00007ff7`6aac0000 00007ff7`70c1a000   image00007ff7`6aac0000
ModLoad: 00007fff`b0c90000 00007fff`b0e88000   ntdll.dll
ModLoad: 00007fff`afba0000 00007fff`afc5d000   C:\WINDOWS\System32\KERNEL32.DLL
ModLoad: 00007fff`ae4d0000 00007fff`ae7a2000   C:\WINDOWS\System32\KERNELBASE.dll
ModLoad: 00007fff`aff90000 00007fff`b00ba000   C:\WINDOWS\System32\ole32.dll
ModLoad: 00007fff`ae7b0000 00007fff`ae8b0000   C:\WINDOWS\System32\ucrtbase.dll
ModLoad: 00007fff`af8d0000 00007fff`af9f5000   C:\WINDOWS\System32\RPCRT4.dll
ModLoad: 00007fff`af4a0000 00007fff`af7f4000   C:\WINDOWS\System32\combase.dll
ModLoad: 00007fff`b0180000 00007fff`b01ab000   C:\WINDOWS\System32\GDI32.dll
ModLoad: 00007fff`ae8b0000 00007fff`ae8d2000   C:\WINDOWS\System32\win32u.dll
ModLoad: 00007fff`ae9c0000 00007fff`aeacf000   C:\WINDOWS\System32\gdi32full.dll
ModLoad: 00007fff`ae3a0000 00007fff`ae43d000   C:\WINDOWS\System32\msvcp_win.dll
ModLoad: 00007fff`afa00000 00007fff`afb9d000   C:\WINDOWS\System32\USER32.dll
ModLoad: 00007fff`b01e0000 00007fff`b028e000   C:\WINDOWS\System32\ADVAPI32.dll
ModLoad: 00007fff`af820000 00007fff`af8be000   C:\WINDOWS\System32\msvcrt.dll
ModLoad: 00007fff`afef0000 00007fff`aff8c000   C:\WINDOWS\System32\sechost.dll
ModLoad: 00007fff`b0a50000 00007fff`b0b1d000   C:\WINDOWS\System32\OLEAUT32.dll
ModLoad: 00007fff`aecf0000 00007fff`af433000   C:\WINDOWS\System32\SHELL32.dll
ModLoad: 00007fff`b0b80000 00007fff`b0beb000   C:\WINDOWS\System32\WS2_32.dll
(4b84.479c): Break instruction exception - code 80000003 (first chance)
ntdll!LdrpDoDebuggerBreak+0x30:
00007fff`b0d60950 cc              int     3
0:000> g
ModLoad: 00007fff`b01b0000 00007fff`b01e0000   C:\WINDOWS\System32\IMM32.DLL
ModLoad: 00007fff`abf80000 00007fff`ac164000   C:\WINDOWS\SYSTEM32\Dbghelp.dll
ModLoad: 00007fff`a01d0000 00007fff`a01fc000   C:\WINDOWS\SYSTEM32\dbgcore.DLL
ModLoad: 00007fff`ae440000 00007fff`ae4c2000   C:\WINDOWS\System32\bcryptPrimitives.dll
ModLoad: 00007fff`ac170000 00007fff`ac901000   C:\WINDOWS\SYSTEM32\windows.storage.dll
ModLoad: 00007fff`ade00000 00007fff`ade30000   C:\WINDOWS\SYSTEM32\Wldp.dll
ModLoad: 00007fff`b0790000 00007fff`b083d000   C:\WINDOWS\System32\SHCORE.dll
ModLoad: 00007fff`b0850000 00007fff`b08a5000   C:\WINDOWS\System32\shlwapi.dll
ModLoad: 00007fff`ae2e0000 00007fff`ae2ff000   C:\WINDOWS\SYSTEM32\profapi.dll
(4b84.479c): Illegal instruction - code c000001d (first chance)
(4b84.479c): Illegal instruction - code c000001d (!!! second chance !!!)
ldc2+0x75db16:
00007ff7`6b21db16 0f0b            ud2

Report from WinDbg's stack panel:

[0x0]   ldc2 + 0x75db16   
[0x1]   ldc2 + 0x396093e   
[0x2]   ldc2 + 0x3953944   
[0x3]   ldc2 + 0x27d47   
[0x4]   ldc2 + 0x3677bad   
[0x5]   ldc2 + 0x35ea5b4   
[0x6]   ldc2 + 0x35e9be7   
[0x7]   ldc2 + 0x356a0df   
[0x8]   ldc2 + 0x35c20ee   
[0x9]   ldc2 + 0x35d0f05   
[0xa]   ldc2 + 0x35bf844   
[0xb]   ldc2 + 0x359b97b   
[0xc]   ldc2 + 0x35a1731   
[0xd]   ldc2 + 0x359b68b   
[0xe]   ldc2 + 0x359b68b   
[0xf]   ldc2 + 0x35f2250   
[0x10]   ldc2 + 0x3928695   
[0x11]   ldc2 + 0x358aad8   
[0x12]   ldc2 + 0x36b9c71   
[0x13]   ldc2 + 0x36b9806   
[0x14]   ldc2 + 0x36b9b44   
[0x15]   ldc2 + 0x35846b6   
[0x16]   ldc2 + 0x393a5e4   
[0x17]   KERNEL32!BaseThreadInitThunk + 0x14   
[0x18]   ntdll!RtlUserThreadStart + 0x21  
thewilsonator commented 1 year ago

Probably this doesn't help much.

Yep that doesn't. The CI releases of LDC probably have debug symbols enabled. That function is not called from many places (https://github.com/ldc-developers/ldc/search?p=2&q=callingConv ignore all the files in gen/abi*, they are definitions). So it shouldn't be too hard to figure out with a stack trace.

aferust commented 1 year ago

@thewilsonator using cuda on nvidia gtx 755m GPU

ldc 1.28 compiles and dub test runs with success. ldc 1.30 yields llvm_unreachable in https://github.com/ldc-developers/ldc/blob/906037988f064bc61dbd671abc820f87bc10d128/gen/abi/nvptx.cpp#L20

thewilsonator commented 1 year ago

I can see that, that was why I was asking for a stack trace with symbols so I can see where that is called from, which is why I directed you to the CI releases of LDC which I believe should have the debug symbols available. Or failing that, compile with -vv (add it to dub.sdl or dub.json's dflags). This produces a lot of output, best to redirect it to a file, and post the last couple of dozen lines of that output.

aferust commented 1 year ago

I am sorry for not emphasizing it well before. I was already using the CI release (ldc2-96b5cfb3-windows-multilib). Please refer to output.txt. I am also including lldb's bt output below (which does not tell much).

D:\projects\d_projects\dub\packages\dcompute>d:\dlang\ldc2-96b5cfb3-windows-multilib\bin\ldc2.exe -mdcompute-targets=cuda-210 -oq -lib -of.dub\build\library-debug-windows-x86_64-ldc_v1.30.0-644701512F8AF52E3CC1634C5B27247E\dcompute.lib -d-debug -g -w --oq -od=.dub\build\library-debug-windows-x86_64-ldc_v1.30.0-644701512F8AF52E3CC1634C5B27247E\/obj -d-version=Have_dcompute -d-version=Have_derelict_cl -d-version=Have_derelict_cuda -d-version=Have_taggedalgebraic -d-version=Have_derelict_util -Isource -IC:\Users\user\AppData\Local\dub\packages\derelict-cl-3.2.0\derelict-cl\source -IC:\Users\user\AppData\Local\dub\packages\derelict-util-3.0.0-beta.2\derelict-util\source -IC:\Users\user\AppData\Local\dub\packages\derelict-cuda-3.1.1\derelict-cuda\source -IC:\Users\user\AppData\Local\dub\packages\taggedalgebraic-0.10.13\taggedalgebraic\source source\dcompute\driver\backend.d source\dcompute\driver\cuda\buffer.d source\dcompute\driver\cuda\context.d source\dcompute\driver\cuda\device.d source\dcompute\driver\cuda\event.d source\dcompute\driver\cuda\kernel.d source\dcompute\driver\cuda\memory.d source\dcompute\driver\cuda\package.d source\dcompute\driver\cuda\platform.d source\dcompute\driver\cuda\program.d source\dcompute\driver\cuda\queue.d source\dcompute\driver\error.d source\dcompute\driver\ocl\buffer.d source\dcompute\driver\ocl\context.d source\dcompute\driver\ocl\device.d source\dcompute\driver\ocl\event.d source\dcompute\driver\ocl\image.d source\dcompute\driver\ocl\kernel.d source\dcompute\driver\ocl\memory.d source\dcompute\driver\ocl\package.d source\dcompute\driver\ocl\platform.d source\dcompute\driver\ocl\program.d source\dcompute\driver\ocl\queue.d source\dcompute\driver\ocl\raw\enums.d source\dcompute\driver\ocl\raw\functions.d source\dcompute\driver\ocl\raw\package.d source\dcompute\driver\ocl\sampler.d source\dcompute\driver\ocl\util.d source\dcompute\driver\util.d source\dcompute\kernels\package.d source\dcompute\std\atomic.d source\dcompute\std\cuda\index.d source\dcompute\std\cuda\sync.d source\dcompute\std\floating.d source\dcompute\std\index.d source\dcompute\std\integer.d source\dcompute\std\memory.d source\dcompute\std\opencl\image.d source\dcompute\std\opencl\index.d source\dcompute\std\opencl\sync.d source\dcompute\std\pack.d source\dcompute\std\package.d source\dcompute\std\sync.d source\dcompute\std\warp.d source\dcompute\tests\dummykernels.d source\dcompute\tests\main.d source\dcompute\tests\test.d -vcolumns -vv >> output.txt

output.txt

lldb:

(lldb) Process 5900 launched: 'd:\dlang\ldc2-96b5cfb3-windows-multilib\bin\ldc2.exe' (x86_64)
Process 5900 stopped
* thread #1, stop reason = Exception 0xc000001d encountered at address 0x7ff7eb4ddb16
    frame #0: 0x00007ff7eb4ddb16 ldc2.exe
->  0x7ff7eb4ddb16: ud2
    0x7ff7eb4ddb18: int3
    0x7ff7eb4ddb19: int3
    0x7ff7eb4ddb1a: int3   
(lldb) bt
* thread #1, stop reason = Exception 0xc000001d encountered at address 0x7ff7eb4ddb16
  * frame #0: 0x00007ff7eb4ddb16 ldc2.exe
aferust commented 1 year ago

is it related to this? pragma(LDC_intrinsic, "llvm.nvvm.barrier0") void barrier0();

thewilsonator commented 1 year ago

Ah, I as wondering why I didn't remember this breaking, I didn't break it! https://github.com/ldc-developers/ldc/commit/dc25da95577960d567a7e84c6056453c66973fa3 should be relatively easy to fix.

Transferring this issue to LDC

thewilsonator commented 1 year ago

https://github.com/ldc-developers/ldc/issues/4266

aferust commented 1 year ago

@thewilsonator Great work btw. I could run Sobel image filtering very well with the cuda driver and dcv. I had to bind sqrt to "llvm.nvvm.sqrt.rn.f". I will probably make a PR for module dcompute.std.floating. I am going to implement more GPU-accelerated stuff in DCV.

Thank you for your efforts both in compute and ldc.

thewilsonator commented 1 year ago

Excellent, please do. And thank you for helping isolate such a critical bug.