shacklettbp / madrona

MIT License
311 stars 29 forks source link

CUDA 12.6 Support? #39

Open aaravpandya opened 2 weeks ago

aaravpandya commented 2 weeks ago

Hi, I'd like to know if madrona supports CUDA 12.6 yet? I have been testing GPUDrive with various cuda versions. I have found that I am unable to compile on following versions -

  1. cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2
  2. cuda12.6.1-devel-ubuntu24.04
  3. cuda 12.5 (I have not tried on this version, but we received some user feedback they were not able to compile on 12.5 and had to to downgrade to 12.4).

Running with the latest madrona commit on 12.6 (445062f), I get these errors on compilation -

(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Compiling GPU engine code:
inlinable function call in a function with debug info must have a !dbg location
  call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %15, i32 %16, i32 %17, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %19)
inlinable function call in a function with debug info must have a !dbg location
  %15 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIiEENS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %11, i32 %12, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %14)
inlinable function call in a function with debug info must have a !dbg location
  %21 = call i32 @_ZN4cuda3std3__426__atomic_exchange_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %16, i32 %17, i32 %18, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %20)
inlinable function call in a function with debug info must have a !dbg location
  call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIjEEjNS1_25__thread_scope_system_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %15, i32 %16, i32 %17, %struct._ZN4cuda3std3__425__thread_scope_system_tagE %19)
inlinable function call in a function with debug info must have a !dbg location
  %15 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIjEENS1_25__thread_scope_system_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %11, i32 %12, %struct._ZN4cuda3std3__425__thread_scope_system_tagE %14)
inlinable function call in a function with debug info must have a !dbg location
  %21 = call i64 @_ZN4cuda3std3__427__atomic_fetch_add_dispatchINS1_16__atomic_storageIyEEyNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIyEE* %16, i64 %17, i32 %18, %struct._ZN4cuda3std3__425__thread_scope_device_tagE %20)
inlinable function call in a function with debug info must have a !dbg location
  call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
  %5 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIiEENS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %4, i32 %3, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
  %7 = call i32 @_ZN4cuda3std3__426__atomic_exchange_dispatchINS1_16__atomic_storageIiEEiNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIiEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
  call void @_ZN4cuda3std3__423__atomic_store_dispatchINS1_16__atomic_storageIjEEjNS1_25__thread_scope_system_tagELi0EEEvPT_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %6, i32 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_system_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
  %5 = call i32 @_ZN4cuda3std3__422__atomic_load_dispatchINS1_16__atomic_storageIjEENS1_25__thread_scope_system_tagELi0EEENT_14__underlying_tEPKS6_NS1_12memory_orderET0_(%struct._ZN4cuda3std3__416__atomic_storageIjEE* %4, i32 %3, %struct._ZN4cuda3std3__425__thread_scope_system_tagE zeroinitializer)
inlinable function call in a function with debug info must have a !dbg location
  %7 = call i64 @_ZN4cuda3std3__427__atomic_fetch_add_dispatchINS1_16__atomic_storageIyEEyNS1_25__thread_scope_device_tagELi0EEENT_14__underlying_tEPS6_T0_NS1_12memory_orderET1_(%struct._ZN4cuda3std3__416__atomic_storageIyEE* %6, i64 %4, i32 %5, %struct._ZN4cuda3std3__425__thread_scope_device_tagE zeroinitializer)
error: Broken module found, compilation aborted!

Error at /home/aarav/gpudrive/external/madrona/src/mw/cpp_compile.cpp:81 in auto madrona::cu::jitCompileCPPSrc(const char *, const char *, const char **, uint32_t, const char **, uint32_t, bool)::(anonymous class)::operator()() const
NVRTC_ERROR_COMPILATION
Aborted (core dumped)

I was able to solve this error by disabling debug mode here. But then I get this follow up error (with verbose compilation for more details) -

(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Compiler Flags:
-I/home/aarav/gpudrive/external/madrona/src/mw/device/include
-I/home/aarav/gpudrive/external/madrona/src/common/../../include
-I/usr/local/cuda/targets/x86_64-linux/include
-std=c++20
-default-device
-rdc=true
-use_fast_math
-DMADRONA_GPU_MODE=1
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP
-DCCCL_DISABLE_BF16_SUPPORT=1
-DCUB_DISABLE_BF16_SUPPORT=1
-arch
sm_89
-DMADRONA_MWGPU_NUM_SMS=(76_i32)
-DMADRONA_MWGPU_MAX_BLOCKS_PER_SM=(1_i32)
-dopt=on
--extra-device-vectorization
-lineinfo
-dlto
-DMADRONA_MWGPU_LTO_MODE=1
-DMADRONA_MWGPU_TASKGRAPH=1

Linker Flags:
-arch=sm_89
-ftz=1
-prec-div=0
-prec-sqrt=0
-fma=1
-optimize-unused-variables
-lineinfo
-lto
-verbose

Compiling GPU engine code:
/home/aarav/gpudrive/external/madrona/src/mw/device/memory.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/state.cpp
/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: qualified name is not allowed
      cuda::atomic<T, cuda::thread_scope_device> impl_;
      ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: this declaration has no storage class or type specifier
      cuda::atomic<T, cuda::thread_scope_device> impl_;
      ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(177): error: expected a ";"
      cuda::atomic<T, cuda::thread_scope_device> impl_;
                  ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: qualified name is not allowed
      cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
      ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: this declaration has no storage class or type specifier
      cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
      ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(328): error: expected a ";"
      cuda::atomic_ref<T, cuda::thread_scope_device> ref_;
                      ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(332): error: identifier "ref_" is undefined
      static_assert(decltype(ref_)::is_always_lock_free);
                             ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(56): error: "impl_" is not a nonstatic data member or base class of class "madrona::Atomic<T>"
          : impl_(v)
            ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(58): error: identifier "impl_" is undefined
          static_assert(decltype(impl_)::is_always_lock_free);
                                 ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(96): error: identifier "impl_" is undefined
          return impl_.exchange(v, order);
                 ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(69): error: identifier "impl_" is undefined
          return impl_.load(sync::relaxed);
                 ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(90): error: identifier "impl_" is undefined
          impl_.store(v, sync::release);
          ^

/home/aarav/gpudrive/external/madrona/src/common/../../include/madrona/sync.hpp(85): error: identifier "impl_" is undefined
          impl_.store(v, sync::relaxed);
          ^

/home/aarav/gpudrive/external/madrona/src/mw/device/include/algorithm(3): catastrophic error: cannot open source file "cuda/std/__algorithm"
  #include <cuda/std/__algorithm>
                                 ^

 and 1 catastrophic error detected in the compilation of "/home/aarav/gpudrive/external/madrona/src/mw/device/state.cpp".
Compilation terminated.

Error at /home/aarav/gpudrive/external/madrona/src/mw/cpp_compile.cpp:100 in CompileOutput madrona::cu::jitCompileCPPSrc(const char *, const char *, const char **, uint32_t, const char **, uint32_t, bool)
NVRTC_ERROR_COMPILATION
Aborted (core dumped)

I also tried the same with the base docker image from nvidia with the same errors.

shacklettbp commented 2 weeks ago

CUDA 12.6 is broken, I reported the issue here: https://github.com/NVIDIA/cccl/issues/2440

Can't do anything about it until they fix this issue.