turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

`import exllamav2` getting this error #319

Closed hemangjoshi37a closed 5 months ago

hemangjoshi37a commented 5 months ago

Command I ran : import exllamav2 in jupyter notebook. Error I got :

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:168, in _load_global_deps()
    167 try:
--> 168     ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    169 except OSError as err:
    170     # Can only happen for wheel with cuda libs as PYPI deps
    171     # As PyTorch is not purelib, but nvidia-*-cu11 is

File /usr/lib/python3.11/ctypes/__init__.py:376, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    375 if handle is None:
--> 376     self._handle = _dlopen(self._name, mode)
    377 else:

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 import exllamav2

File ~/.local/lib/python3.11/site-packages/exllamav2/__init__.py:3
      1 from exllamav2.version import __version__
----> 3 from exllamav2.model import ExLlamaV2
      4 from exllamav2.cache import ExLlamaV2CacheBase
      5 from exllamav2.cache import ExLlamaV2Cache

File ~/.local/lib/python3.11/site-packages/exllamav2/model.py:14
     11 import os
     12 os.environ['CUDA_MODULE_LOADING']='LAZY'
---> 14 import torch
     15 import math
     16 from exllamav2.config import ExLlamaV2Config

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:228
    217 else:
    218     # Easy way.  You want this most of the time, because it will prevent
    219     # C++ symbols from libtorch clobbering C++ symbols from other
   (...)
    225     #
    226     # See Note [Global dependencies]
    227     if USE_GLOBAL_DEPS:
--> 228         _load_global_deps()
    229     from torch._C import *  # noqa: F403
    231 # Appease the type checker; ordinarily this binding is inserted by the
    232 # torch._C module initialization code in C

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:189, in _load_global_deps()
    187     raise err
    188 for lib_folder, lib_name in cuda_libs.items():
--> 189     _preload_cuda_deps(lib_folder, lib_name)
    190 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:154, in _preload_cuda_deps(lib_folder, lib_name)
    152         break
    153 if not lib_path:
--> 154     raise ValueError(f"{lib_name} not found in the system path {sys.path}")
    155 ctypes.CDLL(lib_path)

ValueError: libcublas.so.*[0-9] not found in the system path ['/home/hemang/Downloads/notebook_scripts', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '', '/home/hemang/.local/lib/python3.11/site-packages', '/home/hemang/.local/lib/python3.11/site-packages/tqdm-4.64.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/tenacity-8.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/setuptools-65.6.3-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/qdrant_client-1.4.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pytest-7.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pydantic-1.10.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pandas-2.0.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/openai-0.27.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/numpy-1.24.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/meilisearch-0.21.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/libcst-1.0.1-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/langchain-0.0.231-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/lancedb-0.1.16-py3.11.egg', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.11/dist-packages']

I have exllamav2-0.0.12+rocm5.6-cp311-cp311-linux_x86_64.whl installed using pip and here is install log :

(base) hemang@hemang-levono-15arr:~/Documents/GitHub/exllamav2$ python3.11 -m pip install '/home/hemang/Downloads/exllamav2-0.0.12+rocm5.6-cp311-cp311-linux_x86_64.whl' --break-system-packages
Defaulting to user installation because normal site-packages is not writeable
Processing /home/hemang/Downloads/exllamav2-0.0.12+rocm5.6-cp311-cp311-linux_x86_64.whl
Requirement already satisfied: pandas in /home/hemang/.local/lib/python3.11/site-packages/pandas-2.0.3-py3.11-linux-x86_64.egg (from exllamav2==0.0.12+rocm5.6) (2.0.3)
Collecting ninja (from exllamav2==0.0.12+rocm5.6)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting fastparquet (from exllamav2==0.0.12+rocm5.6)
  Downloading fastparquet-2023.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Requirement already satisfied: torch>=2.0.1 in /home/hemang/.local/lib/python3.11/site-packages (from exllamav2==0.0.12+rocm5.6) (2.0.1)
Collecting safetensors>=0.3.2 (from exllamav2==0.0.12+rocm5.6)
  Downloading safetensors-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting sentencepiece>=0.1.97 (from exllamav2==0.0.12+rocm5.6)
  Downloading sentencepiece-0.1.99-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 2.9 MB/s eta 0:00:00
Requirement already satisfied: pygments in /usr/lib/python3/dist-packages (from exllamav2==0.0.12+rocm5.6) (2.15.1)
Requirement already satisfied: websockets in /home/hemang/.local/lib/python3.11/site-packages (from exllamav2==0.0.12+rocm5.6) (8.1)
Requirement already satisfied: regex in /home/hemang/.local/lib/python3.11/site-packages (from exllamav2==0.0.12+rocm5.6) (2022.10.31)
Requirement already satisfied: filelock in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (3.9.0)
Requirement already satisfied: typing-extensions in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (4.5.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (1.12)
Requirement already satisfied: networkx in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (3.0)
Requirement already satisfied: jinja2 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (3.1.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.7.99)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.7.101)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (8.5.0.96)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.10.3.66)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (10.2.10.91)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.4.0.1)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.7.4.91)
Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (2.14.3)
Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (11.7.91)
Requirement already satisfied: triton==2.0.0 in /home/hemang/.local/lib/python3.11/site-packages (from torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (2.0.0)
Requirement already satisfied: setuptools in /home/hemang/.local/lib/python3.11/site-packages/setuptools-65.6.3-py3.11.egg (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (65.6.3)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (0.42.0)
Requirement already satisfied: cmake in /home/hemang/.local/lib/python3.11/site-packages (from triton==2.0.0->torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (3.26.0)
Requirement already satisfied: lit in /home/hemang/.local/lib/python3.11/site-packages (from triton==2.0.0->torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (15.0.7)
Requirement already satisfied: numpy>=1.20.3 in /home/hemang/.local/lib/python3.11/site-packages/numpy-1.24.3-py3.11-linux-x86_64.egg (from fastparquet->exllamav2==0.0.12+rocm5.6) (1.24.3)
Collecting cramjam>=2.3 (from fastparquet->exllamav2==0.0.12+rocm5.6)
  Downloading cramjam-2.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Requirement already satisfied: fsspec in /home/hemang/.local/lib/python3.11/site-packages (from fastparquet->exllamav2==0.0.12+rocm5.6) (2023.1.0)
Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from fastparquet->exllamav2==0.0.12+rocm5.6) (23.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/hemang/.local/lib/python3.11/site-packages (from pandas->exllamav2==0.0.12+rocm5.6) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/hemang/.local/lib/python3.11/site-packages (from pandas->exllamav2==0.0.12+rocm5.6) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /home/hemang/.local/lib/python3.11/site-packages (from pandas->exllamav2==0.0.12+rocm5.6) (2022.7)
Requirement already satisfied: six>=1.5 in /home/hemang/.local/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->exllamav2==0.0.12+rocm5.6) (1.12.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2->torch>=2.0.1->exllamav2==0.0.12+rocm5.6) (2.1.3)
Downloading safetensors-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 4.5 MB/s eta 0:00:00
Downloading fastparquet-2023.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 3.3 MB/s eta 0:00:00
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 2.8 MB/s eta 0:00:00
Downloading cramjam-2.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 3.6 MB/s eta 0:00:00
Installing collected packages: sentencepiece, ninja, safetensors, cramjam, fastparquet, exllamav2
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.96
    Uninstalling sentencepiece-0.1.96:
      Successfully uninstalled sentencepiece-0.1.96
  Attempting uninstall: safetensors
    Found existing installation: safetensors 0.3.1
    Uninstalling safetensors-0.3.1:
      Successfully uninstalled safetensors-0.3.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.4.8 requires tenacity<8.2.0, but you have tenacity 8.2.2 which is incompatible.
argostranslate 1.8.0 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.99 which is incompatible.
Successfully installed cramjam-2.8.1 exllamav2-0.0.12+rocm5.6 fastparquet-2023.10.1 ninja-1.11.1.1 safetensors-0.4.2 sentencepiece-0.1.99

here is my ubuntu info :

# System Details Report
---

## Report details
- **Date generated:**                              2024-02-01 13:51:29

## Hardware Information:
- **Hardware Model:**                              Lenovo Lenovo Ideapad 330-15ARR
- **Memory:**                                      20.0 GiB
- **Processor:**                                   AMD Ryzen™ 5 2500U with Radeon™ Vega Mobile Gfx × 8
- **Graphics:**                                    AMD Radeon™ Graphics
- **Disk Capacity:**                               1.0 TB

## Software Information:
- **Firmware Version:**                            7VCN24WW
- **OS Name:**                                     Ubuntu Noble Numbat (development branch)
- **OS Build:**                                    (null)
- **OS Type:**                                     64-bit
- **GNOME Version:**                               45.3
- **Windowing System:**                            Wayland
- **Kernel Version:**                              Linux 6.6.0-14-generic

here is my amdgpu and rocm install info :

(base) hemang@hemang-levono-15arr:~$ lsmod | grep amdgpu
amdgpu              15728640  21
drm_exec               16384  1 amdgpu
amdxcp                 12288  1 amdgpu
drm_buddy              20480  1 amdgpu
gpu_sched              61440  1 amdgpu
drm_suballoc_helper    16384  1 amdgpu
drm_ttm_helper         12288  1 amdgpu
ttm                   110592  2 amdgpu,drm_ttm_helper
drm_display_helper    241664  1 amdgpu
drm_kms_helper        274432  4 drm_display_helper,amdgpu
i2c_algo_bit           16384  1 amdgpu
video                  73728  2 amdgpu,ideapad_laptop
drm                   798720  20 gpu_sched,i2c_hid,drm_kms_helper,drm_exec,drm_suballoc_helper,drm_display_helper,drm_buddy,amdgpu,drm_ttm_helper,ttm,amdxcp
(base) hemang@hemang-levono-15arr:~$ dpkg -l | grep rocm
ii  rocm-core                                                6.0.2.60002-115~22.04                        amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-hip-libraries                                       6.0.2.60002-115~22.04                        amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-hip-runtime                                         6.0.2.60002-115~22.04                        amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-language-runtime                                    6.0.2.60002-115~22.04                        amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-llvm                                                17.0.0.24012.60002-115~22.04                 amd64        ROCm compiler
ii  rocm-smi-lib                                             6.0.0.60002-115~22.04                        amd64        AMD System Management libraries
ii  rocminfo                                                 1.0.0.60002-115~22.04                        amd64        Radeon Open Compute (ROCm) Runtime rocminfo tool
(base) hemang@hemang-levono-15arr:~$ lsmod | grep amdkfd
(base) hemang@hemang-levono-15arr:~$ /opt/rocm/bin/rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2000                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    20014888(0x1316728) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    20014888(0x1316728) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    20014888(0x1316728) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx902                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5597(0x15dd)                       
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1100                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            8                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 464                                
  SDMA engine uCode::      169                                
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    262144(0x40000) KB                 
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    262144(0x40000) KB                 
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx902:xnack+   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             
(base) hemang@hemang-levono-15arr:~$ /opt/rocm/opencl/bin/clinfo
bash: /opt/rocm/opencl/bin/clinfo: No such file or directory
turboderp commented 5 months ago

Which version of Torch do you have installed?

hemangjoshi37a commented 5 months ago

You can use find page function in chrome and search for torch. I believe that will give more accurate answer to this then mine. if you cant find then let me know .

turboderp commented 5 months ago

Thing is I can't tell from that output, only that it's >= 2.0.1. It looks like Torch is failing to find CUDA dependencies, which strikes me as odd if it's the ROCm version. What do you get from pip show torch?

hemangjoshi37a commented 5 months ago

this comamnd to check torch version :

import torch
print(torch.__version__)

gives this error :

--------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/torch/__init__.py:168, in _load_global_deps()
    167 try:
--> 168     ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    169 except OSError as err:
    170     # Can only happen for wheel with cuda libs as PYPI deps
    171     # As PyTorch is not purelib, but nvidia-*-cu11 is

File /usr/lib/python3.11/ctypes/__init__.py:376, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    375 if handle is None:
--> 376     self._handle = _dlopen(self._name, mode)
    377 else:

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 import torch
      2 print(torch.__version__)

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:228
    217 else:
    218     # Easy way.  You want this most of the time, because it will prevent
    219     # C++ symbols from libtorch clobbering C++ symbols from other
   (...)
    225     #
    226     # See Note [Global dependencies]
    227     if USE_GLOBAL_DEPS:
--> 228         _load_global_deps()
    229     from torch._C import *  # noqa: F403
    231 # Appease the type checker; ordinarily this binding is inserted by the
    232 # torch._C module initialization code in C

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:189, in _load_global_deps()
    187     raise err
    188 for lib_folder, lib_name in cuda_libs.items():
--> 189     _preload_cuda_deps(lib_folder, lib_name)
    190 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)

File ~/.local/lib/python3.11/site-packages/torch/__init__.py:154, in _preload_cuda_deps(lib_folder, lib_name)
    152         break
    153 if not lib_path:
--> 154     raise ValueError(f"{lib_name} not found in the system path {sys.path}")
    155 ctypes.CDLL(lib_path)

ValueError: libcublas.so.*[0-9] not found in the system path ['/home/hemang/Downloads/notebook_scripts', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '', '/home/hemang/.local/lib/python3.11/site-packages', '/home/hemang/.local/lib/python3.11/site-packages/tqdm-4.64.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/tenacity-8.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/setuptools-65.6.3-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/qdrant_client-1.4.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pytest-7.2.2-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pydantic-1.10.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/pandas-2.0.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/openai-0.27.8-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/numpy-1.24.3-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/meilisearch-0.21.0-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/libcst-1.0.1-py3.11-linux-x86_64.egg', '/home/hemang/.local/lib/python3.11/site-packages/langchain-0.0.231-py3.11.egg', '/home/hemang/.local/lib/python3.11/site-packages/lancedb-0.1.16-py3.11.egg', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.11/dist-packages']
hemangjoshi37a commented 5 months ago
(base) hemang@hemang-levono-15arr:~/Documents/GitHub/exllamav2$ python3.11 -m pip show torch
Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/hemang/.local/lib/python3.11/site-packages
Requires: filelock, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions
Required-by: accelerate, effdet, exllamav2, instruct-goose, pfrl, pytorch-lightning, stable-baselines3, stanza, timm, torchdata, torchmetrics, torchtext, torchtyping, torchvision, triton, xformers
hemangjoshi37a commented 5 months ago

is there torch ROCm version needed to make this work with ROCm enabled AMD GPUs?

turboderp commented 5 months ago

PyTorch on ROCm requires the ROCm build, yes. If you download using the selector here you should get the right one.

hemangjoshi37a commented 5 months ago

Now I have installed ROCm version 2.2.0+rocm5.7 of pytorch but getting this new error :

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 import exllamav2

File ~/.local/lib/python3.11/site-packages/exllamav2/__init__.py:3
      1 from exllamav2.version import __version__
----> 3 from exllamav2.model import ExLlamaV2
      4 from exllamav2.cache import ExLlamaV2CacheBase
      5 from exllamav2.cache import ExLlamaV2Cache

File ~/.local/lib/python3.11/site-packages/exllamav2/model.py:16
     14 import torch
     15 import math
---> 16 from exllamav2.config import ExLlamaV2Config
     17 from exllamav2.cache import ExLlamaV2CacheBase
     18 from exllamav2.linear import ExLlamaV2Linear

File ~/.local/lib/python3.11/site-packages/exllamav2/config.py:2
      1 import torch
----> 2 from exllamav2.fasttensors import STFile
      3 import os, glob, json
      5 class ExLlamaV2Config:

File ~/.local/lib/python3.11/site-packages/exllamav2/fasttensors.py:5
      3 import numpy as np
      4 import json
----> 5 from exllamav2.ext import exllamav2_ext as ext_c
      6 import os
      8 def convert_dtype(dt: str):

File ~/.local/lib/python3.11/site-packages/exllamav2/ext.py:15
     13 build_jit = False
     14 try:
---> 15     import exllamav2_ext
     16 except ModuleNotFoundError:
     17     build_jit = True

ImportError: /home/hemang/.local/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb

pytorch version output :

import torch
print(torch.__version__)
2.2.0+rocm5.7
turboderp commented 5 months ago

I think this happens when the extension has been compiled for one version of Torch but you're importing it into another version. Torch 2.2 is very new, and the prebuilt ROCm wheels are made for Torch 2.1.0 so they may just not be compatible.

Since you have ROCm installed, you can try uninstalling the wheel (pip uninstall exllamav2) and running pip install . in the exllamav2 folder. Other than that you may need to use Torch 2.1.x until I've had a chance to recompile everything for Torch 2.2.0.

turboderp commented 5 months ago

For 0.0.13 I bumped the Torch dependency to 2.2 and this seems to be resolved.

rafa-9 commented 4 months ago

@turboderp I'm having the exact same error even after bumping Torch to 2.2.0.

I'm running it on Runpod with the following relevant code:

echo "Installing Torch 2.2.0"
pip3 install --no-cache-dir torch==${TORCH_VERSION} torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

echo "Installing xformers"
pip3 install --no-cache-dir xformers

echo "Installing Oobabooga Text Generation Web UI"
pip3 install -r requirements.txt
bash -c 'for req in extensions/*/requirements.txt ; do pip3 install -r "$req" ; done'

echo "Installing repositories"
mkdir -p repositories
cd repositories
git clone https://github.com/turboderp/exllama
pip3 install -r exllama/requirements.txt
!pip3 install flash-attn==2.3
!pip3 install xformers==0.0.21

!pip3 uninstall -y exllamav2
!pip3 install exllamav2==0.0.13

with base: FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

and starting Ooba using:

source /workspace/venv/bin/activate
  mkdir -p /runpod-volume/logs
  nohup python3 server.py \
    --listen \
    --api \
    --model ${MODEL} \
    --loader ExLlamav2 \
    --extensions openai \
    --trust-remote-code &> /runpod-volume/logs/textgen.log &
Any idea? Thanks!!