Open ajarmusch opened 1 year ago
ptxas is indeed fairly limited in terms of debug info in optimized builds.
That said, it is possible to generate lineinfo in optimized GPU binaries. The key is to avoid generating .target ...debug
directive and stick with line info directives emitted in PTX. Without debug
, ptxas -lineinfo
will accept line info. This is what clang does when we're compiling CUDA.
I think you need to change emissionKind: LineTablesOnly
-> emissionKind: DebugDirectivesOnly
and that should give you line info with ptxas optimizations enabled.
@Artem-B thanks for the help! It looks like it would be a problem if I built LLVM with Debug? The thing is, I'm building LLVM with Release
not debug
, so ptxas
should accept line info?
@Artem-B thanks for the help! It looks like it would be a problem if I built LLVM with Debug?
It's not about how you build LLVM. It's about the IR processed by LLVM.
The thing is, I'm building LLVM with
Release
notdebug
, soptxas
should accept line info?
Regardless of how you build LLVM itself, the IR you use to generate PTX should have correct debug info metadata, so it generates PTX which would be accepted by ptxas
.
We can not change what ptxas does, so it's up to the LLVM user to set up the compilation pipeline just so.
@Artem-B to clarify, we need to set emissionKind
with DebugDirectivesOnly
in llvm
where should I look to change emissionKind
?
@Artem-B to clarify, we need to set
emissionKind
withDebugDirectivesOnly
in llvmwhere should I look to change
emissionKind
?
@dwblaikie What's the right way to specify the kind of debug info we want to produce?
@Artem-B to clarify, we need to set
emissionKind
withDebugDirectivesOnly
in llvm where should I look to changeemissionKind
?@dwblaikie What's the right way to specify the kind of debug info we want to produce?
It's implemented in CGDebugInfo::CreateCompileUnit
which uses CodeGenOpts::getDebugInfo
- so somewhere in there/in terms of how that CodeGenOpt
gets initialized? (there's various bits of driver code that seem to handle this sort of thing for instance clang/lib/Driver/ToolChains/Cuda.cpp:mustEmitDebugInfo
This is a problem in general if you do clang++ -foffload-lto -fopenmp --offload-arch=sm_89 -gline-tables-only
or any other kind of debug. The existing handling tries to degrade the debug emission when optimizations are present. It seems to be emitting the .debug
line, which then throws said error when run after the LTO pass. I wonder if we could just manually remove that in cases where optimizations are on.
@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:
cmake ../llvm -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLIBC_GPU_BUILD=On \
-DLLVM_ENABLE_RUNTIMES="openmp;libunwind;libcxx;libcxxabi;libc" \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=/root/Projects/install
ninja
I get the following error log:
[0/9] cd /root/Projects/llvm-project/build/runtimes/runtimes-bins && /usr/bin/cmake --build .
ninja: no work to do.
[1/9] cd /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins && /usr/bin/cmake --build .
[1/588] Building CXX object libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o
FAILED: libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o
/root/Projects/llvm-project/build/./bin/clang++ --target=nvptx64-nvidia-cuda -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -D_DEBUG -I/root/Projects/llvm-project/libc -isystem /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins/libc/include -g --target=nvptx64-nvidia-cuda -fpie -ffreestanding -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Wno-unknown-cuda-version -mllvm -nvptx-emit-init-fini-kernel=false --cuda-feature=+ptx63 --cuda-path=/usr/local/cuda -isystem/root/Projects/llvm-project/build/lib/clang/19/include -nostdinc -fno-lto -march=native -MD -MT libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o -MF libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o.d -o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.abort.__internal__.dir/abort.cpp.o -c /root/Projects/llvm-project/libc/src/stdlib/gpu/abort.cpp
fatal error: error in backend: Cannot select: t3: i32,ch = AtomicLoad<(load seq_cst (s32) from %ir.val)> t0, t2, libc/src/__support/CPP/atomic.h:75:14
t2: i64,ch = CopyFromReg t0, Register:i64 %0, libc/src/__support/CPP/atomic.h:75:14
t1: i64 = Register %0
In function: _ZN22__llvm_libc_19_0_0_git3cpp6AtomicIjE4loadENS0_11MemoryOrderENS0_11MemoryScopeE
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.0.0git (git@github.com:TSWorld1314/llvm-project.git 06bb8c9f202e37f215b26ca0dd9b2d8adaf5a83d)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /root/Projects/llvm-project/build/bin
ninja: build stopped: subcommand failed.
FAILED: runtimes/runtimes-nvptx64-nvidia-cuda-stamps/runtimes-nvptx64-nvidia-cuda-build /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-stamps/runtimes-nvptx64-nvidia-cuda-build
cd /root/Projects/llvm-project/build/runtimes/runtimes-nvptx64-nvidia-cuda-bins && /usr/bin/cmake --build .
ninja: build stopped: subcommand failed.
I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.
@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:
Glad to see someone trying out the GPU
libc
stuff, there's probably a few kinks that I haven't ironed out yet as the sole developer.I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.
This issue is ancient unfortunately https://github.com/llvm/llvm-project/issues/48651. There are many holes in the NVPTX backend and not enough people to fill them given NVIDIA's lack of involvement. What's happening here is that the atomic is being expanded to something like this.
switch (Kind) {
case __ATOMIC_RELAXED: ...
case __ATOMIC_ACQUIRE: ...
}
Some of these are not implemented by the backend. Normally, we optimize these out via some constant propagation and it does not get seen by the backend. However when compiling with -O0
those will hang around until they crash as you've observed.
This only occurs during the creation of the PTX used for the internal test suite. The LLVM-IR should be fine so long as those branches are removed before it makes it to the backend. The tests are built automatically if you have a functioning GPU, so I may need a way to disable that. For a quick hack right now you could try export CUDA_VISIBLE_DEVICES=""
when compiling so it doesn't detect your GPU, then unsetting it later.
However, using the library once its been built will probably result in similar errors unless you turn on optimizations, at which case you'd get Optimized debugging not supported
errors as well. So, right now I'd just say that debugging on PTX just doesn't really work.
I'll probably make a patch that disables NVPTX tests if CMAKE_BUILD_TYPE STREQUAL "Debug"
@jhuber6 ,Hello,Joseph.I have encountered the same problem again. When I use the following commands to build this project:
Glad to see someone trying out the GPU
libc
stuff, there's probably a few kinks that I haven't ironed out yet as the sole developer.I want to know if the nvptx64-nvidia-cuda target is not supported in debug mode, while the release mode works fine.
This issue is ancient unfortunately #48651. There are many holes in the NVPTX backend and not enough people to fill them given NVIDIA's lack of involvement. What's happening here is that the atomic is being expanded to something like this.
switch (Kind) { case __ATOMIC_RELAXED: ... case __ATOMIC_ACQUIRE: ... }
Some of these are not implemented by the backend. Normally, we optimize these out via some constant propagation and it does get seen by the backend. However when compiling with
-O0
those will hang around until they crash as you've observed.This only occurs during the creation of the PTX used for the internal test suite. The LLVM-IR should be fine so long as those branches are removed before it makes it to the backend. The tests are built automatically if you have a functioning GPU, so I may need a way to disable that. For a quick hack right now you could try
export CUDA_VISIBLE_DEVICES=""
when compiling so it doesn't detect your GPU, then unsetting it later.However, using the library once its been built will probably result in similar errors unless you turn on optimizations, at which case you'd get
Optimized debugging not supported
errors as well. So, right now I'd just say that debugging on PTX just doesn't really work.I'll probably make a patch that disables NVPTX tests if
CMAKE_BUILD_TYPE STREQUAL "Debug"
Thanks,Joseph.I am a fan of yours!!!
I was running
ninja check all
and the offloading/info.c test failed.The error was
ptxas fatal : Optimized debugging not supported
. ptxas doesn't seem to be able to support the-gline-tables-only
flagTo reproduce the error
reduced.ll: