Open lightb0x opened 5 months ago
Hey @lightb0x
See if you can experiment with a docker container running CUDA 11.8. Here's an image I made and I was able to successfully run the benchmark - https://hub.docker.com/r/shubhamgupto/mamba
It is compiled for jetson GPUs (arm64) but im sure you can recreate it for yours as well
Hello there,
I could reproduce the result using version 24.02 and 23.10 of nvcr.io/nvidia/pytorch
image.
I am pretty sure I can reproduce the result using the image you offered, shubhamgupto/mamba.
What I wanted to point out is that, AFAIK, requirement of CUDA 11.6+
is general enough to include CUDA 12.4
.
If it's not the case, feel free to close this thread.
Hi @lightb0x ,
Would you mind sharing your Dockerfile
on installing mamba in docker?
Here is my dockerfile but the installation process cannot find cuda.
FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04 AS base
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
git \
wget \
unzip \
libopenblas-dev \
python3.9 \
python3.9-dev \
python3-pip
# Upgrade pip
RUN python3.9 -m pip install --no-cache-dir --upgrade pip
COPY torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl
RUN python3.9 -m pip install /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl
ENV CUDA_HOME=/usr/local/cuda-11.7
RUN python3.9 -m pip install causal-conv1d
RUN python3.9 -m pip install mamba-ssm --no-cache-dir
Error log:
RUN python3.9 -m pip install causal-conv1d
---> Running in a1c07a6214b0
Collecting causal-conv1d
Downloading causal_conv1d-1.2.0.post2.tar.gz (7.1 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.7'
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 97, in <module>
_, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 59, in get_cuda_bare_metal_version
raw_output = subprocess.check_output(
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.7/bin/nvcc'
torch.__version__ = 2.0.0+cu117
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
The command '/bin/sh -c python3.9 -m pip install causal-conv1d' returned a non-zero code: 1
Any comments or suggestions are highly appreciated.
Hi @lightb0x ,
Would you mind sharing your
Dockerfile
on installing mamba in docker?Here is my dockerfile but the installation process cannot find cuda.
FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04 AS base RUN apt-get update && \ apt-get install -y software-properties-common && \ add-apt-repository ppa:deadsnakes/ppa && \ DEBIAN_FRONTEND=noninteractive apt-get install -y \ git \ wget \ unzip \ libopenblas-dev \ python3.9 \ python3.9-dev \ python3-pip # Upgrade pip RUN python3.9 -m pip install --no-cache-dir --upgrade pip COPY torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl RUN python3.9 -m pip install /tmp/torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl ENV CUDA_HOME=/usr/local/cuda-11.7 RUN python3.9 -m pip install causal-conv1d RUN python3.9 -m pip install mamba-ssm --no-cache-dir
Error log:
RUN python3.9 -m pip install causal-conv1d ---> Running in a1c07a6214b0 Collecting causal-conv1d Downloading causal_conv1d-1.2.0.post2.tar.gz (7.1 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'error' error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [22 lines of output] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.7' Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 97, in <module> _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME) File "/tmp/pip-install-8g7s6ozd/causal-conv1d_a21dfa6b4cec41df8f79f5bdbe7f5cfd/setup.py", line 59, in get_cuda_bare_metal_version raw_output = subprocess.check_output( File "/usr/lib/python3.9/subprocess.py", line 424, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.9/subprocess.py", line 505, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.7/bin/nvcc' torch.__version__ = 2.0.0+cu117 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. The command '/bin/sh -c python3.9 -m pip install causal-conv1d' returned a non-zero code: 1
Any comments or suggestions are highly appreciated.
your CUDA_HOME should be set to where nvcc
is installed.
usually its /usr/local/cuda/bin/nvcc
or whatever the path is to the CUDA install. You can also try whereis nvcc
to look for it
hi @IamShubhamGupto ,
Thanks for your quick reply very much.
I made a stupid mistake. I should use nvidia-cuda devel
rather than runtime version.
My environment is as follows:
I tried following 3 images:
nvcr.io/nvidia/pytorch:24.03-py3
nvcr.io/nvidia/pytorch:24.02-py3
nvcr.io/nvidia/pytorch:23.10-py3
I ran following code, which is basically installing
mamba-ssm
and running example inference:While images 24.02 and 23.10 ran as expected,
nvcr.io/nvidia/pytorch:24.03-py3
give following error:I did not check if addresses on the error remains the same along multiple runs or not.
I guess source of the problem could be latest CUDA 12.4.0.41 on 24.03 version of the image, in contrast to CUDA 12.3.2 on 24.02 version of the image. (CUDA version referenced here)