Segmentation fault on `nvcr.io/nvidia/pytorch:24.03-py3`

My environment is as follows:

Ubuntu 22.04.4
nvidia-driver-535-server (latest LTS, specifically 535.161.07)
nvidia-ctk 1.14.6, commit 5605d191332dcfeea802c4497360d60a65c7887e
Docker version 26.0.0, build 2ae903e
RTX A6000

I tried following 3 images:

nvcr.io/nvidia/pytorch:24.03-py3
nvcr.io/nvidia/pytorch:24.02-py3
nvcr.io/nvidia/pytorch:23.10-py3

I ran following code, which is basically installing mamba-ssm and running example inference:

python3 -m pip install mamba-ssm
python3 -m pip install -e 3rdparty/lm-evaluation-harness
python3 benchmarks/benchmark_generation_mamba_simple.py\
  --model-name "state-spaces/mamba-130m"\
  --prompt "My cat wrote all this CUDA code for a new language model and"\
  --topp 0.9\
  --temperature 0.7\
  --repetition-penalty 1.2

While images 24.02 and 23.10 ran as expected, nvcr.io/nvidia/pytorch:24.03-py3 give following error:

Number of parameters: 129135360
[6bc529686365:1060 :0:1060] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f8028b213d0)
==== backtrace (tid:   1060) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000009985f caffe2::TypeMeta::error_unsupported_typemeta()  ???:0
 2 0x000000000007b940 caffe2::TypeMeta::toScalarType()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/c10/util/typeid.h:482
 3 0x000000000007b940 typeMetaToScalarType()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/c10/core/ScalarTypeToTypeMeta.h:24
 4 0x000000000007b940 at::TensorBase::scalar_type()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/ATen/core/TensorBase.h:339
 5 0x000000000007b940 selective_scan_fwd()  /home/runner/work/mamba/mamba/csrc/selective_scan/selective_scan.cpp:234
 6 0x000000000008e704 pybind11::detail::argument_loader<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool>::call_impl<std::vector<at::Tensor, std::allocator<at::Tensor> >, std::vector<at::Tensor, std::allocator<at::Tensor> > (*&)(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool), 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, pybind11::detail::void_type>()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:1480
 7 0x000000000008e704 pybind11::detail::argument_loader<at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool>::call<std::vector<at::Tensor, std::allocator<at::Tensor> >, pybind11::detail::void_type, std::vector<at::Tensor, std::allocator<at::Tensor> > (*&)(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool)>()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:1449
 8 0x000000000008e704 pybind11::cpp_function::initialize<std::vector<at::Tensor, std::allocator<at::Tensor> > (*&)(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool), std::vector<at::Tensor, std::allocator<at::Tensor> >, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool, pybind11::name, pybind11::scope, pybind11::sibling, char [23]>(std::vector<at::Tensor, std::allocator<at::Tensor> > (*&)(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool), std::vector<at::Tensor, std::allocator<at::Tensor> > (*)(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [23])::{lambda(pybind11::detail::function_call&)#3}::operator()()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:253
 9 0x000000000008b9f8 pybind11::cpp_function::dispatcher()  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:946
10 0x000000000015a10e PyObject_CallFunctionObjArgs()  ???:0
11 0x0000000000150a7b _PyObject_MakeTpCall()  ???:0
12 0x0000000000149629 _PyEval_EvalFrameDefault()  ???:0
13 0x000000000015a9fc _PyFunction_Vectorcall()  ???:0
14 0x00000000007710c9 THPFunction_apply()  ???:0
15 0x000000000015a138 PyObject_CallFunctionObjArgs()  ???:0
16 0x000000000016942b PyObject_Call()  ???:0
17 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
18 0x00000000001687f1 PyMethod_New()  ???:0
19 0x0000000000148cfa _PyEval_EvalFrameDefault()  ???:0
20 0x000000000015a9fc _PyFunction_Vectorcall()  ???:0
21 0x000000000014453c _PyEval_EvalFrameDefault()  ???:0
22 0x00000000001687f1 PyMethod_New()  ???:0
23 0x0000000000169492 PyObject_Call()  ???:0
24 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
25 0x00000000001687f1 PyMethod_New()  ???:0
26 0x0000000000169492 PyObject_Call()  ???:0
27 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
28 0x000000000015a9fc _PyFunction_Vectorcall()  ???:0
29 0x000000000014fcbd _PyObject_FastCallDictTstate()  ???:0
30 0x000000000016586c _PyObject_Call_Prepend()  ???:0
31 0x0000000000280700 PyInit__datetime()  ???:0
32 0x0000000000150a7b _PyObject_MakeTpCall()  ???:0
33 0x000000000014a150 _PyEval_EvalFrameDefault()  ???:0
34 0x00000000001687f1 PyMethod_New()  ???:0
35 0x0000000000169492 PyObject_Call()  ???:0
36 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
37 0x00000000001687f1 PyMethod_New()  ???:0
38 0x0000000000169492 PyObject_Call()  ???:0
39 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
40 0x000000000015a9fc _PyFunction_Vectorcall()  ???:0
41 0x000000000014fcbd _PyObject_FastCallDictTstate()  ???:0
42 0x000000000016586c _PyObject_Call_Prepend()  ???:0
43 0x0000000000280700 PyInit__datetime()  ???:0
44 0x0000000000150a7b _PyObject_MakeTpCall()  ???:0
45 0x000000000014a150 _PyEval_EvalFrameDefault()  ???:0
46 0x00000000001687f1 PyMethod_New()  ???:0
47 0x0000000000169492 PyObject_Call()  ???:0
48 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
49 0x00000000001687f1 PyMethod_New()  ???:0
50 0x0000000000169492 PyObject_Call()  ???:0
51 0x00000000001455d7 _PyEval_EvalFrameDefault()  ???:0
52 0x000000000015a9fc _PyFunction_Vectorcall()  ???:0
53 0x000000000014fcbd _PyObject_FastCallDictTstate()  ???:0
54 0x000000000016586c _PyObject_Call_Prepend()  ???:0
55 0x0000000000280700 PyInit__datetime()  ???:0
56 0x0000000000150a7b _PyObject_MakeTpCall()  ???:0
=================================
Segmentation fault (core dumped)

I did not check if addresses on the error remains the same along multiple runs or not.

I guess source of the problem could be latest CUDA 12.4.0.41 on 24.03 version of the image, in contrast to CUDA 12.3.2 on 24.02 version of the image. (CUDA version referenced here)

Hey @lightb0x

See if you can experiment with a docker container running CUDA 11.8. Here's an image I made and I was able to successfully run the benchmark - https://hub.docker.com/r/shubhamgupto/mamba

It is compiled for jetson GPUs (arm64) but im sure you can recreate it for yours as well

Hello there,

I could reproduce the result using version 24.02 and 23.10 of nvcr.io/nvidia/pytorch image. I am pretty sure I can reproduce the result using the image you offered, shubhamgupto/mamba.

What I wanted to point out is that, AFAIK, requirement of CUDA 11.6+ is general enough to include CUDA 12.4. If it's not the case, feel free to close this thread.

Hi @lightb0x ,

Would you mind sharing your Dockerfile on installing mamba in docker?

Here is my dockerfile but the installation process cannot find cuda.

FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04 AS base

RUN apt-get update && \
  apt-get install -y software-properties-common && \
  add-apt-repository ppa:deadsnakes/ppa && \
  DEBIAN_FRONTEND=noninteractive apt-get install -y \
  git \
  wget \
  unzip \
  libopenblas-dev \
  python3.9 \
  python3.9-dev \
  python3-pip

# Upgrade pip
RUN python3.9 -m pip install --no-cache-dir --upgrade pip
COPY torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl
RUN python3.9 -m pip install /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl

ENV CUDA_HOME=/usr/local/cuda-11.7
RUN python3.9 -m pip install causal-conv1d
RUN python3.9 -m pip install mamba-ssm --no-cache-dir

Error log:

RUN python3.9 -m pip install causal-conv1d
 ---> Running in a1c07a6214b0
Collecting causal-conv1d
  Downloading causal_conv1d-1.2.0.post2.tar.gz (7.1 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.7'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 97, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 59, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output(
        File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.9/subprocess.py", line 505, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.7/bin/nvcc'

      torch.__version__  = 2.0.0+cu117

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
The command '/bin/sh -c python3.9 -m pip install causal-conv1d' returned a non-zero code: 1

Any comments or suggestions are highly appreciated.

Hi @lightb0x ,

Would you mind sharing your Dockerfile on installing mamba in docker?

Here is my dockerfile but the installation process cannot find cuda.

FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04 AS base

RUN apt-get update && \
  apt-get install -y software-properties-common && \
  add-apt-repository ppa:deadsnakes/ppa && \
  DEBIAN_FRONTEND=noninteractive apt-get install -y \
  git \
  wget \
  unzip \
  libopenblas-dev \
  python3.9 \
  python3.9-dev \
  python3-pip

# Upgrade pip
RUN python3.9 -m pip install --no-cache-dir --upgrade pip
COPY torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl
RUN python3.9 -m pip install /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl

ENV CUDA_HOME=/usr/local/cuda-11.7
RUN python3.9 -m pip install causal-conv1d
RUN python3.9 -m pip install mamba-ssm --no-cache-dir

Error log:

RUN python3.9 -m pip install causal-conv1d
 ---> Running in a1c07a6214b0
Collecting causal-conv1d
  Downloading causal_conv1d-1.2.0.post2.tar.gz (7.1 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.7'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 97, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 59, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output(
        File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.9/subprocess.py", line 505, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.7/bin/nvcc'

      torch.__version__  = 2.0.0+cu117

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
The command '/bin/sh -c python3.9 -m pip install causal-conv1d' returned a non-zero code: 1

Any comments or suggestions are highly appreciated.

your CUDA_HOME should be set to where nvcc is installed.

usually its /usr/local/cuda/bin/nvcc or whatever the path is to the CUDA install. You can also try whereis nvcc to look for it

hi @IamShubhamGupto ,

Thanks for your quick reply very much.

I made a stupid mistake. I should use nvidia-cuda devel rather than runtime version.

state-spaces / mamba

Segmentation fault on `nvcr.io/nvidia/pytorch:24.03-py3` #289