ptxas Optimized debugging not supported

ajarmusch commented 1 year ago

I was running ninja check all and the offloading/info.c test failed.

The error was ptxas fatal : Optimized debugging not supported. ptxas doesn't seem to be able to support the -gline-tables-only flag

To reproduce the error

llc -O2 reduced.bc -o reduced.bc.s
ptxas -O2 reduced.bc.s

reduced.ll:

; ModuleID = 'reduced.ll'
source_filename = "reduced.ll"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

!llvm.module.flags = !{!0}
!llvm.dbg.cu = !{!1}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = distinct !DICompileUnit(language: DW_LANG_C11, file: !2, producer: "clang version 18.0.0 (https://github.com/llvm/llvm-project.git 3f8e5fd08f33c3e8bce464f3b866dda5210ca943)", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, nameTableKind: None)
!2 = !DIFile(filename: "info.c", directory: "/home/users/jarmusch/LIT_CIT/llvm-test-suite-nvidia")

Artem-B commented 1 year ago

ptxas is indeed fairly limited in terms of debug info in optimized builds.

That said, it is possible to generate lineinfo in optimized GPU binaries. The key is to avoid generating .target ...debug directive and stick with line info directives emitted in PTX. Without debug, ptxas -lineinfo will accept line info. This is what clang does when we're compiling CUDA.

I think you need to change emissionKind: LineTablesOnly -> emissionKind: DebugDirectivesOnly and that should give you line info with ptxas optimizations enabled.

ajarmusch commented 11 months ago

@Artem-B thanks for the help! It looks like it would be a problem if I built LLVM with Debug? The thing is, I'm building LLVM with Release not debug, so ptxas should accept line info?

Artem-B commented 11 months ago

@Artem-B thanks for the help! It looks like it would be a problem if I built LLVM with Debug?

It's not about how you build LLVM. It's about the IR processed by LLVM.

The thing is, I'm building LLVM with Release not debug, so ptxas should accept line info?

Regardless of how you build LLVM itself, the IR you use to generate PTX should have correct debug info metadata, so it generates PTX which would be accepted by ptxas.

We can not change what ptxas does, so it's up to the LLVM user to set up the compilation pipeline just so.

ajarmusch commented 11 months ago

@Artem-B to clarify, we need to set emissionKind with DebugDirectivesOnly in llvm

where should I look to change emissionKind?

Artem-B commented 11 months ago

@Artem-B to clarify, we need to set emissionKind with DebugDirectivesOnly in llvm

where should I look to change emissionKind?

@dwblaikie What's the right way to specify the kind of debug info we want to produce?

dwblaikie commented 11 months ago

@Artem-B to clarify, we need to set emissionKind with DebugDirectivesOnly in llvm where should I look to change emissionKind?

@dwblaikie What's the right way to specify the kind of debug info we want to produce?

It's implemented in CGDebugInfo::CreateCompileUnit which uses CodeGenOpts::getDebugInfo - so somewhere in there/in terms of how that CodeGenOpt gets initialized? (there's various bits of driver code that seem to handle this sort of thing for instance clang/lib/Driver/ToolChains/Cuda.cpp:mustEmitDebugInfo

jhuber6 commented 9 months ago

This is a problem in general if you do clang++ -foffload-lto -fopenmp --offload-arch=sm_89 -gline-tables-only or any other kind of debug. The existing handling tries to degrade the debug emission when optimizations are present. It seems to be emitting the .debug line, which then throws said error when run after the LTO pass. I wonder if we could just manually remove that in cases where optimizations are on.

harrisonGPU commented 7 months ago

@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:

cmake ../llvm -G Ninja \
   -DLLVM_ENABLE_PROJECTS="clang;lld" \
   -DCMAKE_C_COMPILER=clang \
   -DCMAKE_CXX_COMPILER=clang++ \
   -DLIBC_GPU_BUILD=On \
   -DLLVM_ENABLE_RUNTIMES="openmp;libunwind;libcxx;libcxxabi;libc" \
   -DCMAKE_BUILD_TYPE=Debug \
   -DCMAKE_INSTALL_PREFIX=/root/Projects/install
ninja

I get the following error log:

[0/9] cd /root/Projects/llvm-project/build/runtimes/runtimes-bins && /usr/bin/cmake --build .
ninja: no work to do.
[1/9] cd /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins && /usr/bin/cmake --build .
[1/588] Building CXX object libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o
FAILED: libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o 
/root/Projects/llvm-project/build/./bin/clang++ --target=nvptx64-nvidia-cuda -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -D_DEBUG -I/root/Projects/llvm-project/libc -isystem /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins/libc/include -g --target=nvptx64-nvidia-cuda -fpie -ffreestanding -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Wno-unknown-cuda-version -mllvm -nvptx-emit-init-fini-kernel=false --cuda-feature=+ptx63 --cuda-path=/usr/local/cuda -isystem/root/Projects/llvm-project/build/lib/clang/19/include -nostdinc -fno-lto -march=native -MD -MT libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o -MF libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o.d -o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o -c /root/Projects/llvm-project/libc/src/stdlib/gpu/abort.cpp
fatal error: error in backend: Cannot select: t3: i32,ch = AtomicLoad<(load seq_cst (s32) from %ir.val)> t0, t2, libc/src/__support/CPP/atomic.h:75:14
  t2: i64,ch = CopyFromReg t0, Register:i64 %0, libc/src/__support/CPP/atomic.h:75:14
    t1: i64 = Register %0
In function: _ZN22__llvm_libc_19_0_0_git3cpp6AtomicIjE4loadENS0_11MemoryOrderENS0_11MemoryScopeE
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.0.0git (git@github.com:TSWorld1314/llvm-project.git 06bb8c9f202e37f215b26ca0dd9b2d8adaf5a83d)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /root/Projects/llvm-project/build/bin
ninja: build stopped: subcommand failed.
FAILED: runtimes/runtimes-nvptx64-nvidia-cuda-stamps/runtimes-nvptx64-nvidia-cuda-build /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-stamps/runtimes-nvptx64-nvidia-cuda-build 
cd /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins && /usr/bin/cmake --build .
ninja: build stopped: subcommand failed.

I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.

jhuber6 commented 7 months ago

@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:

Glad to see someone trying out the GPU libc stuff, there's probably a few kinks that I haven't ironed out yet as the sole developer.

I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.

This issue is ancient unfortunately https://github.com/llvm/llvm-project/issues/48651. There are many holes in the NVPTX backend and not enough people to fill them given NVIDIA's lack of involvement. What's happening here is that the atomic is being expanded to something like this.

switch (Kind) {
  case __ATOMIC_RELAXED: ...
  case __ATOMIC_ACQUIRE: ...
}

Some of these are not implemented by the backend. Normally, we optimize these out via some constant propagation and it does not get seen by the backend. However when compiling with -O0 those will hang around until they crash as you've observed.

This only occurs during the creation of the PTX used for the internal test suite. The LLVM-IR should be fine so long as those branches are removed before it makes it to the backend. The tests are built automatically if you have a functioning GPU, so I may need a way to disable that. For a quick hack right now you could try export CUDA_VISIBLE_DEVICES="" when compiling so it doesn't detect your GPU, then unsetting it later.

However, using the library once its been built will probably result in similar errors unless you turn on optimizations, at which case you'd get Optimized debugging not supported errors as well. So, right now I'd just say that debugging on PTX just doesn't really work.

I'll probably make a patch that disables NVPTX tests if CMAKE_BUILD_TYPE STREQUAL "Debug"

harrisonGPU commented 7 months ago

@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:

Glad to see someone trying out the GPU libc stuff, there's probably a few kinks that I haven't ironed out yet as the sole developer.

I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.

This issue is ancient unfortunately #48651. There are many holes in the NVPTX backend and not enough people to fill them given NVIDIA's lack of involvement. What's happening here is that the atomic is being expanded to something like this.
switch (Kind) {
  case __ATOMIC_RELAXED: ...
  case __ATOMIC_ACQUIRE: ...
}
Some of these are not implemented by the backend. Normally, we optimize these out via some constant propagation and it does get seen by the backend. However when compiling with -O0 those will hang around until they crash as you've observed.

This only occurs during the creation of the PTX used for the internal test suite. The LLVM-IR should be fine so long as those branches are removed before it makes it to the backend. The tests are built automatically if you have a functioning GPU, so I may need a way to disable that. For a quick hack right now you could try export CUDA_VISIBLE_DEVICES="" when compiling so it doesn't detect your GPU, then unsetting it later.

However, using the library once its been built will probably result in similar errors unless you turn on optimizations, at which case you'd get Optimized debugging not supported errors as well. So, right now I'd just say that debugging on PTX just doesn't really work.

I'll probably make a patch that disables NVPTX tests if CMAKE_BUILD_TYPE STREQUAL "Debug"

Thanks,Joseph.I am a fan of yours!!!

llvm / llvm-project

ptxas Optimized debugging not supported #70132