Open pearu opened 4 months ago
Do we even support nvcc as a CUDA compiler?
also CC @ddunl
Do we even support nvcc as a CUDA compiler?
FWIW, NVCC is the default cuda compiler in configure.py.
Sorry for the delay - I was out. I will look into this. Interestingly we don't see this on the CI. Which version of CUDA are you using?
Interestingly we don't see this on the CI. Which version of CUDA are you using?
12.1.0
Hey @pearu, I've pushed a (potential) fix. Would you be able to conform whether it actually fixes your issue?
IIUC, the (potential) fix is equivalent to applying a patch containing:
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -473,7 +473,7 @@ gpu_only_cc_library(
]),
)
-gpu_kernel_library(
+cc_library(
name = "redzone_allocator_kernel_cuda",
srcs = [
"redzone_allocator_kernel.h",
When I apply this to my local branch (I can try main later if needed), there is a progress: redzone_allocator_kernel_cuda.cc
compiles successfully. However, the build breaks at compiling gpu_timer_kernel_cuda.cu.cc
with a similar failure:
Just for the sake of experiment, making gpu_timer_kernel_cuda
a cc_library
as well, the build now fails with:
HTH
Ok cool. That seems to be the same issue, but in a different file. I can fix that tomorrow.
Hey, I won't be getting to this today. As a workaround you could use a later version of the CUDA toolkit. We use CUDA 12.3 in the CI, so I believe 12.3+ shouldn't have the bug anymore. Or - as you already discovered - you can use Clang as the CUDA compiler instead.
FWIW, NVCC is the default cuda compiler in configure.py
Yes, it should be changed to Clang since that's the only thing we test in CI.
@beckerhe
I confirm that when using CUDA 12.3.2, this issue cannot be reproduced: bazel build is successful without applying the above mentioned patches.
After commit https://github.com/openxla/xla/commit/d8f0c1acdb79c18cdce0a050b1d7c6baa8b9f14b, building XLA fails for CUDA backend. Reproducer:
With
the XLA build is succesful.
Using:
CC: @beckerhe