Hi,
I'd like to know if madrona supports CUDA 12.6 yet? I have been testing GPUDrive with various cuda versions. I have found that I am unable to compile on following versions -
cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2
cuda12.6.1-devel-ubuntu24.04
cuda 12.5 (I have not tried on this version, but we received some user feedback they were not able to compile on 12.5 and had to to downgrade to 12.4).
Running with the latest madrona commit on 12.6 (445062f), I get these errors on compilation -
(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Compiling GPU engine code:
inlinable function call in a function with debug info must have a !dbg location
call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %15, i32 %16, i32 %17, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %19)
inlinable function call in a function with debug info must have a !dbg location
%15 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIiEENS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %11, i32 %12, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %14)
inlinable function call in a function with debug info must have a !dbg location
%21 = call i32 @_ZN4cuda3std3__426__atomic_exchange_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %16, i32 %17, i32 %18, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %20)
inlinable function call in a function with debug info must have a !dbg location
call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIjEEjNS1_25__thread_scope_system_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %15, i32 %16, i32 %17, %struct._ZN4cuda3std3__425__thread_scope_system_tagE %19)
inlinable function call in a function with debug info must have a !dbg location
%15 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIjEENS1_25__thread_scope_system_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %11, i32 %12, %struct._ZN4cuda3std3__425__thread_scope_system_tagE %14)
inlinable function call in a function with debug info must have a !dbg location
%21 = call i64 @_ZN4cuda3std3__427__atomic_fetch_add_dispatchINS1_16__atomic_storageIyEEyNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIyEE* %16, i64 %17, i32 %18, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %20)
inlinable function call in a function with debug info must have a !dbg location
call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
%5 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIiEENS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %4, i32 %3, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
%7 = call i32 @_ZN4cuda3std3__426__atomic_exchange_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIjEEjNS1_25__thread_scope_system_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_system_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
%5 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIjEENS1_25__thread_scope_system_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %4, i32 %3, %struct._ZN4cuda3std3__425__thread_scope_system_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
%7 = call i64 @_ZN4cuda3std3__427__atomic_fetch_add_dispatchINS1_16__atomic_storageIyEEyNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIyEE* %6, i64 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
error: Broken module found, compilation aborted!
Error at /home/aarav/gpudrive/external/madrona/src/mw/cpp_compile.cpp:81 in auto madrona::cu::jitCompileCPPSrc(const char *, const char *, const char **, uint32_t, const char **, uint32_t, bool)::(anonymous class)::operator()() const
NVRTC_ERROR_COMPILATION
Aborted (core dumped)
I was able to solve this error by disabling debug mode here. But then I get this follow up error (with verbose compilation for more details) -
(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Compiler Flags:
-I/home/aarav/gpudrive/external/madrona/src/mw/device/include
-I/home/aarav/gpudrive/external/madrona/src/common/../../include
-I/usr/local/cuda/targets/x86_64-linux/include
-std=c++20
-default-device
-rdc=true
-use_fast_math
-DMADRONA_GPU_MODE=1
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP
-DCCCL_DISABLE_BF16_SUPPORT=1
-DCUB_DISABLE_BF16_SUPPORT=1
-arch
sm_89
-DMADRONA_MWGPU_NUM_SMS=(76_i32)
-DMADRONA_MWGPU_MAX_BLOCKS_PER_SM=(1_i32)
-dopt=on
--extra-device-vectorization
-lineinfo
-dlto
-DMADRONA_MWGPU_LTO_MODE=1
-DMADRONA_MWGPU_TASKGRAPH=1
Linker Flags:
-arch=sm_89
-ftz=1
-prec-div=0
-prec-sqrt=0
-fma=1
-optimize-unused-variables
-lineinfo
-lto
-verbose
Compiling GPU engine code:
/home/aarav/gpudrive/external/madrona/src/mw/device/memory.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/state.cpp
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: qualified name is not allowed
cuda::atomic<T, cuda::thread_scope_device> impl_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: this declaration has no storage class or type specifier
cuda::atomic<T, cuda::thread_scope_device> impl_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: expected a ";"
cuda::atomic<T, cuda::thread_scope_device> impl_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: qualified name is not allowed
cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: this declaration has no storage class or type specifier
cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: expected a ";"
cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(332): error: identifier "ref_" is undefined
static_assert(decltype(ref_)::is_always_lock_free);
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(56): error: "impl_" is not a nonstatic data member or base class of class "madrona::Atomic<T>"
: impl_(v)
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(58): error: identifier "impl_" is undefined
static_assert(decltype(impl_)::is_always_lock_free);
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(96): error: identifier "impl_" is undefined
return impl_.exchange(v, order);
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(69): error: identifier "impl_" is undefined
return impl_.load(sync::relaxed);
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(90): error: identifier "impl_" is undefined
impl_.store(v, sync::release);
^
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(85): error: identifier "impl_" is undefined
impl_.store(v, sync::relaxed);
^
/home/aarav/gpudrive/external/madrona/src/mw/device/include/algorithm(3): catastrophic error: cannot open source file "cuda/std/__algorithm"
#include <cuda/std/__algorithm>
^
and 1 catastrophic error detected in the compilation of "/home/aarav/gpudrive/external/madrona/src/mw/device/state.cpp".
Compilation terminated.
Error at /home/aarav/gpudrive/external/madrona/src/mw/cpp_compile.cpp:100 in CompileOutput madrona::cu::jitCompileCPPSrc(const char *, const char *, const char **, uint32_t, const char **, uint32_t, bool)
NVRTC_ERROR_COMPILATION
Aborted (core dumped)
I also tried the same with the base docker image from nvidia with the same errors.
Hi, I'd like to know if madrona supports CUDA 12.6 yet? I have been testing GPUDrive with various cuda versions. I have found that I am unable to compile on following versions -
Running with the latest madrona commit on 12.6 (445062f), I get these errors on compilation -
I was able to solve this error by disabling debug mode here. But then I get this follow up error (with verbose compilation for more details) -
I also tried the same with the base docker image from nvidia with the same errors.