Docker build on a100 gpu libtorch cuda error

embercult commented 6 months ago

root@f629ddabf1f2:/code/build# ./opensplat /data/hulk/
Using CUDA
Reading 12149 points
Loading /data/hulk/images/0039.jpg
Loading /data/hulk/images/0009.jpg
Loading /data/hulk/images/0035.jpg
Loading Loading Loading Loading /data/hulk/images/0001.jpg
Loading Loading Loading /data/hulk/images/0007.jpg
Loading Loading /data/hulk/images/0005.jpg
/data/hulk/images/0021.jpgLoading
/data/hulk/images/0025.jpg
Loading Loading /data/hulk/images/0033.jpgLoading Loading
Loading Loading /data/hulk/images/0019.jpg/data/hulk/images/0029.jpg/data/hulk/images/0015.jpg
/data/hulk/images/0003.jpg
/data/hulk/images/0037.jpg/data/hulk/images/0017.jpgLoading /data/hulk/images/0023.jpg

/data/hulk/images/0031.jpg
/data/hulk/images/0011.jpg
/data/hulk/images/0013.jpg

/data/hulk/images/0027.jpg
Loading /data/hulk/images/0038.jpg
Loading /data/hulk/images/0022.jpg
Loading /data/hulk/images/0018.jpg
Loading /data/hulk/images/0002.jpg
Loading /data/hulk/images/0024.jpg
Loading /data/hulk/images/0008.jpg
Loading /data/hulk/images/0006.jpg
Loading /data/hulk/images/0014.jpg
Loading /data/hulk/images/0016.jpg
Loading /data/hulk/images/0036.jpg
Loading /data/hulk/images/0010.jpg
Loading /data/hulk/images/0030.jpg
Loading /data/hulk/images/0026.jpg
Loading /data/hulk/images/0032.jpg
Loading /data/hulk/images/0004.jpg
Loading /data/hulk/images/0020.jpg
Loading /data/hulk/images/0028.jpg
Loading /data/hulk/images/0034.jpg
Loading /data/hulk/images/0012.jpg
CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7fbae7f03a0c in /code/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7fbae7ead8bc in /code/libtorch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3cc (0x7fbae7b1001c in /code/libtorch/lib/libc10_cuda.so)
frame #3: void at::native::gpu_kernel_impl<__nv_hdl_wrapper_t<false, true, false, __nv_dl_tag<void (*)(at::TensorIteratorBase&), &at::native::direct_copy_kernel_cuda, 12u>, long (long)> >(at::TensorIteratorBase&, __nv_hdl_wrapper_t<false, true, false, __nv_dl_tag<void (*)(at::TensorIteratorBase&), &at::native::direct_copy_kernel_cuda, 12u>, long (long)> const&) + 0x4bf (0x7fba7db8834f in /code/libtorch/lib/libtorch_cuda.so)
frame #4: void at::native::gpu_kernel<__nv_hdl_wrapper_t<false, true, false, __nv_dl_tag<void (*)(at::TensorIteratorBase&), &at::native::direct_copy_kernel_cuda, 12u>, long (long)> >(at::TensorIteratorBase&, __nv_hdl_wrapper_t<false, true, false, __nv_dl_tag<void (*)(at::TensorIteratorBase&), &at::native::direct_copy_kernel_cuda, 12u>, long (long)> const&) + 0x34b (0x7fba7db888eb in /code/libtorch/lib/libtorch_cuda.so)
frame #5: at::native::direct_copy_kernel_cuda(at::TensorIteratorBase&) + 0x39c (0x7fba7db6fd7c in /code/libtorch/lib/libtorch_cuda.so)
frame #6: at::native::copy_device_to_device(at::TensorIterator&, bool, bool) + 0xcbd (0x7fba7db70bcd in /code/libtorch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x1ae7bb2 (0x7fba7db72bb2 in /code/libtorch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x1cf4596 (0x7fbad14e4596 in /code/libtorch/lib/libtorch_cpu.so)
frame #9: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x7a (0x7fbad14e5e3a in /code/libtorch/lib/libtorch_cpu.so)
frame #10: at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) + 0x16f (0x7fbad231571f in /code/libtorch/lib/libtorch_cpu.so)
frame #11: at::native::_to_copy(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) + 0x1b23 (0x7fbad1806303 in /code/libtorch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x2f5545f (0x7fbad274545f in /code/libtorch/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) + 0x109 (0x7fbad1d674b9 in /code/libtorch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x2d271fa (0x7fbad25171fa in /code/libtorch/lib/libtorch_cpu.so)
frame #15: at::_ops::_to_copy::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) + 0x1fe (0x7fbad1e0565e in /code/libtorch/lib/libtorch_cpu.so)
frame #16: at::native::to(at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>) + 0xc2 (0x7fbad17fdbd2 in /code/libtorch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x31927b8 (0x7fbad29827b8 in /code/libtorch/lib/libtorch_cpu.so)
frame #18: at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>) + 0x18b (0x7fbad1fc8d2b in /code/libtorch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0x1f2e610 (0x7fbad171e610 in /code/libtorch/lib/libtorch_cpu.so)
frame #20: <unknown function> + 0x1f2e70d (0x7fbad171e70d in /code/libtorch/lib/libtorch_cpu.so)
frame #21: at::native::structured_sum_out::impl(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>, at::Tensor const&) + 0x64 (0x7fbad171e814 in /code/libtorch/lib/libtorch_cpu.so)
frame #22: <unknown function> + 0x3573697 (0x7fba7f5fe697 in /code/libtorch/lib/libtorch_cuda.so)
frame #23: <unknown function> + 0x357375d (0x7fba7f5fe75d in /code/libtorch/lib/libtorch_cuda.so)
frame #24: at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) + 0x1d8 (0x7fbad218cf28 in /code/libtorch/lib/libtorch_cpu.so)
frame #25: at::native::sum(at::Tensor const&, std::optional<c10::ScalarType>) + 0x3e (0x7fbad17146ee in /code/libtorch/lib/libtorch_cpu.so)
frame #26: <unknown function> + 0x2f537e8 (0x7fbad27437e8 in /code/libtorch/lib/libtorch_cpu.so)
frame #27: at::_ops::sum::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>) + 0x8b (0x7fbad20dbacb in /code/libtorch/lib/libtorch_cpu.so)
frame #28: <unknown function> + 0x492b508 (0x7fbad411b508 in /code/libtorch/lib/libtorch_cpu.so)
frame #29: <unknown function> + 0x492baab (0x7fbad411baab in /code/libtorch/lib/libtorch_cpu.so)
frame #30: at::_ops::sum::call(at::Tensor const&, std::optional<c10::ScalarType>) + 0x14d (0x7fbad218c96d in /code/libtorch/lib/libtorch_cpu.so)
frame #31: <unknown function> + 0x92ee9 (0x55eb1efb1ee9 in ./opensplat)
frame #32: <unknown function> + 0x33031 (0x55eb1ef52031 in ./opensplat)
frame #33: __libc_start_main + 0xf3 (0x7fba7afb0083 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #34: <unknown function> + 0x3526e (0x55eb1ef5426e in ./opensplat)

pfxuan commented 6 months ago

It seems like the docker image was built with an unmatched CUDA compute capability. To support A100 GPU, you can add TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0"

docker build \
  -t opensplat:ubuntu-22.04-cuda-12.1.1-torch-2.2.1 \
  --build-arg UBUNTU_VERSION=22.04 \
  --build-arg CUDA_VERSION=12.1.1 \
  --build-arg TORCH_VERSION=2.2.1 \
  --build-arg TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0" \
  --build-arg CMAKE_BUILD_TYPE=Release .

embercult commented 6 months ago

I had added that still got this issue

pfxuan commented 6 months ago

I've created PR https://github.com/pierotofy/OpenSplat/pull/91 in hopes that it can resolve your build issue. With the new update, you can test this build method:

Replace

--build-arg TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0"

With

--build-arg CMAKE_CUDA_ARCHITECTURES="70;75;80"

pfxuan commented 6 months ago

Closed via https://github.com/pierotofy/OpenSplat/pull/91

pierotofy / OpenSplat

Docker build on a100 gpu libtorch cuda error #90