Closed hemangjoshi37a closed 5 months ago
Which version of Torch do you have installed?
You can use find page function in chrome and search for torch. I believe that will give more accurate answer to this then mine. if you cant find then let me know .
Thing is I can't tell from that output, only that it's >= 2.0.1. It looks like Torch is failing to find CUDA dependencies, which strikes me as odd if it's the ROCm version. What do you get from pip show torch
?
this comamnd to check torch version :
import torch
print(torch.__version__)
gives this error :
--------------------------------------------------------------------------
OSError Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:168, in _load_global_deps()
167 try:
--> 168 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
169 except OSError as err:
170 # Can only happen for wheel with cuda libs as PYPI deps
171 # As PyTorch is not purelib, but nvidia-*-cu11 is
File /usr/lib/python3.11/ctypes/__init__.py:376, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
375 if handle is None:
--> 376 self._handle = _dlopen(self._name, mode)
377 else:
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[12], line 1
----> 1 import torch
2 print(torch.__version__)
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:228
217 else:
218 # Easy way. You want this most of the time, because it will prevent
219 # C++ symbols from libtorch clobbering C++ symbols from other
(...)
225 #
226 # See Note [Global dependencies]
227 if USE_GLOBAL_DEPS:
--> 228 _load_global_deps()
229 from torch._C import * # noqa: F403
231 # Appease the type checker; ordinarily this binding is inserted by the
232 # torch._C module initialization code in C
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:189, in _load_global_deps()
187 raise err
188 for lib_folder, lib_name in cuda_libs.items():
--> 189 _preload_cuda_deps(lib_folder, lib_name)
190 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:154, in _preload_cuda_deps(lib_folder, lib_name)
152 break
153 if not lib_path:
--> 154 raise ValueError(f"{lib_name} not found in the system path {sys.path}")
155 ctypes.CDLL(lib_path)
ValueError: libcublas.so.*[0-9] not found in the system path ['/home/hemang/Downloads/notebook_scripts', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '', '/home/hemang/.local/lib/python3.11/site-packages', '/home/hemang/.local/lib/python3.11/site-packages/tqdm-4.64.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/tenacity-8.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/setuptools-65.6.3-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/qdrant_client-1.4.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pytest-7.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pydantic-1.10.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pandas-2.0.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/openai-0.27.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/numpy-1.24.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/meilisearch-0.21.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/libcst-1.0.1-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/langchain-0.0.231-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/lancedb-0.1.16-py3.11.egg', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.11/dist-packages']
(base) hemang@hemang-levono-15arr:~/Documents/GitHub/exllamav2$ python3.11 -m pip show torch
Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/hemang/.local/lib/python3.11/site-packages
Requires: filelock, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions
Required-by: accelerate, effdet, exllamav2, instruct-goose, pfrl, pytorch-lightning, stable-baselines3, stanza, timm, torchdata, torchmetrics, torchtext, torchtyping, torchvision, triton, xformers
is there torch ROCm version needed to make this work with ROCm enabled AMD GPUs?
PyTorch on ROCm requires the ROCm build, yes. If you download using the selector here you should get the right one.
Now I have installed ROCm version 2.2.0+rocm5.7
of pytorch but getting this new error :
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import exllamav2
File ~/.local/lib/python3.11/site-packages/exllamav2/__init__.py:3
1 from exllamav2.version import __version__
----> 3 from exllamav2.model import ExLlamaV2
4 from exllamav2.cache import ExLlamaV2CacheBase
5 from exllamav2.cache import ExLlamaV2Cache
File ~/.local/lib/python3.11/site-packages/exllamav2/model.py:16
14 import torch
15 import math
---> 16 from exllamav2.config import ExLlamaV2Config
17 from exllamav2.cache import ExLlamaV2CacheBase
18 from exllamav2.linear import ExLlamaV2Linear
File ~/.local/lib/python3.11/site-packages/exllamav2/config.py:2
1 import torch
----> 2 from exllamav2.fasttensors import STFile
3 import os, glob, json
5 class ExLlamaV2Config:
File ~/.local/lib/python3.11/site-packages/exllamav2/fasttensors.py:5
3 import numpy as np
4 import json
----> 5 from exllamav2.ext import exllamav2_ext as ext_c
6 import os
8 def convert_dtype(dt: str):
File ~/.local/lib/python3.11/site-packages/exllamav2/ext.py:15
13 build_jit = False
14 try:
---> 15 import exllamav2_ext
16 except ModuleNotFoundError:
17 build_jit = True
ImportError: /home/hemang/.local/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb
pytorch version output :
import torch
print(torch.__version__)
2.2.0+rocm5.7
I think this happens when the extension has been compiled for one version of Torch but you're importing it into another version. Torch 2.2 is very new, and the prebuilt ROCm wheels are made for Torch 2.1.0 so they may just not be compatible.
Since you have ROCm installed, you can try uninstalling the wheel (pip uninstall exllamav2
) and running pip install .
in the exllamav2 folder. Other than that you may need to use Torch 2.1.x until I've had a chance to recompile everything for Torch 2.2.0.
For 0.0.13 I bumped the Torch dependency to 2.2 and this seems to be resolved.
@turboderp I'm having the exact same error even after bumping Torch to 2.2.0.
I'm running it on Runpod with the following relevant code:
echo "Installing Torch 2.2.0"
pip3 install --no-cache-dir torch==${TORCH_VERSION} torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
echo "Installing xformers"
pip3 install --no-cache-dir xformers
echo "Installing Oobabooga Text Generation Web UI"
pip3 install -r requirements.txt
bash -c 'for req in extensions/*/requirements.txt ; do pip3 install -r "$req" ; done'
echo "Installing repositories"
mkdir -p repositories
cd repositories
git clone https://github.com/turboderp/exllama
pip3 install -r exllama/requirements.txt
!pip3 install flash-attn==2.3
!pip3 install xformers==0.0.21
!pip3 uninstall -y exllamav2
!pip3 install exllamav2==0.0.13
with base: FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
and starting Ooba using:
source /workspace/venv/bin/activate
mkdir -p /runpod-volume/logs
nohup python3 server.py \
--listen \
--api \
--model ${MODEL} \
--loader ExLlamav2 \
--extensions openai \
--trust-remote-code &> /runpod-volume/logs/textgen.log &
Any idea? Thanks!!
Command I ran :
import exllamav2
in jupyter notebook. Error I got :I have
exllamav2-0.0.12+rocm5.6-cp311-cp311-linux_x86_64.whl
installed using pip and here is install log :here is my ubuntu info :
here is my amdgpu and rocm install info :