ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream

sjstulga commented 1 year ago

I am using exllama through the oobabooga text-generation-webui with AMD/ROCm. I cloned exllama into the text-generation-webui/repositories folder and installed dependencies.

Devices: 2x AMD Instinct MI60 gfx906 Distro: Ubuntu 20.04.6 Kernel: 5.15.0-76-generic ROCm version 5.6.0 pytorch version 2.0.1 built from source

My command line

(textgen) myuser@mymachine:~/text-generation-webui$ python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api

Output

2023-07-13 09:05:48 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-13 09:05:48 WARNING:Exllama module failed to load. Will attempt to load from repositories.
Successfully preprocessed all matching files.
2023-07-13 09:05:48 ERROR:Could not find repositories/exllama/. Make sure that exllama is cloned inside repositories/ and is up to date.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
ModuleNotFoundError: No module named 'exllama'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 296, in ExLlama_loader
    from modules.exllama import ExllamaModel
  File "/home/muser/text-generation-webui/modules/exllama.py", line 19, in <module>
    from generator import ExLlamaGenerator
  File "/home/myuser/text-generation-webui/repositories/exllama/generator.py", line 1, in <module>
    import cuda_ext
  File "/home/myuser/text-generation-webui/repositories/exllama/cuda_ext.py", line 43, in <module>
    exllama_ext = load(
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream

jmoney7823956789378 commented 1 year ago

Hey fellow MI60 chad. Exllama in ooba's webui recent changed to using the pip module. Try python -m pip install git+https://github.com/jllllll/exllama

sjstulga commented 1 year ago

:handshake: I'm all about that 32GB of HBM2

I've installed exllama via pip using the command line that you provided, but I'm still seeing the ImportError related to hipblasGetStream

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/__init__.py", line 1, in <module>
    from . import cuda_ext, generator, model, tokenizer
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/cuda_ext.py", line 9, in <module>
    import exllama_ext
ImportError: /home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: hipblasGetStream

I should have hipBLAS installed correctly, and I've checked for the hipblas.h file

(textgen) myuser@mymachine:~/text-generation-webui$ ls /opt/rocm/include/hipblas/
hipblas-export.h  hipblas.h  hipblas_module.f90  hipblas-version.h

jmoney7823956789378 commented 1 year ago

Hmm weird, what's the log say right before Traceback (most recent call last): ? Also, does your $PATH have /opt/rocm/bin right at the start? export PATH=/opt/rocm/bin:$PATH

sjstulga commented 1 year ago

(textgen) myuser@mymachine:~/text-generation-webui$ echo $PATH
/opt/rocm/bin:/home/myuser/miniconda3/envs/textgen/bin:/home/myuser/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Full output:

(textgen) myuser@mymachine:~/text-generation-webui$ python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api
2023-07-13 10:14:19 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-13 10:14:20 WARNING:Exllama module failed to load. Will attempt to load from repositories.
2023-07-13 10:14:20 ERROR:Could not find repositories/exllama/. Make sure that exllama is cloned inside repositories/ and is up to date.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/__init__.py", line 1, in <module>
    from . import cuda_ext, generator, model, tokenizer
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/cuda_ext.py", line 9, in <module>
    import exllama_ext
ImportError: /home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: hipblasGetStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 296, in ExLlama_loader
    from modules.exllama import ExllamaModel
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 19, in <module>
    from generator import ExLlamaGenerator
ModuleNotFoundError: No module named 'generator'

I removed exllama from text-generation-webui/repositories when I installed it via pip

jmoney7823956789378 commented 1 year ago

I kept my exllama folder in /repositories and it seemed to work, but... I'm not sure about the hip modules missing.

sjstulga commented 1 year ago

Do you think that it is potentially an environment/system issue with the ROCm installation and not an exllama issue? I've tried running with both the pip module installed and exllama cloned to text-generation-webui/repositories but the output is identical to above. I am not sure what else to try, I might reformat and try on Ubuntu 22.04 instead of 20.04. What OS are you using with your MI60s? Or are you using a docker container with them?

jmoney7823956789378 commented 1 year ago

Do you think that it is potentially an environment/system issue with the ROCm installation and not an exllama issue? I've tried running with both the pip module installed and exllama cloned to text-generation-webui/repositories but the output is identical to above. I am not sure what else to try, I might reformat and try on Ubuntu 22.04 instead of 20.04. What OS are you using with your MI60s? Or are you using a docker container with them?

I've switched between docker and standalone system, I used 22.04 for both. You might also have luck with an Arch docker if you're into that. I'm assuming you've probably been following this guide? https://rentry.co/eq3hg

There was another github with a bunch of rocm scripts for setting up stablediffusion, ooba, etc, but I can't seem to find it now.

sjstulga commented 1 year ago

I've been having mostly success running GPTQ single-GPU by following that rentry.co guide already. I say "mostly success" because some models output no tokens, gibberish, or some error; but other models run great. I have not been able to do any kind of multi-GPU yet though, so far I have only been running 30B/33B sized models on each MI60. I'd love to get exllama working with multi-GPU so that I can run 65B sized models across my 2 MI60s.

jmoney7823956789378 commented 1 year ago

I was running GPTQ multi for 65B, it's pretty slow across the two MI60s, but you have soooo much memory to spare you could probably dump context like it's nothing. Do you have your MI60s both in x16 slots?

sjstulga commented 1 year ago

They are both in physical x16 slots but one is running in PCIE Gen3x16 mode and the other is unfortunately running PCIE Gen3x4 mode at the moment.

jmoney7823956789378 commented 1 year ago

the other is unfortunately running PCIE Gen3x4 mode at the moment.

I think... the MI60s HATE running in <16x for some reason. just a theory, since I have always had trouble with the like that until I swapped to epyc boards... I think our underlying issue with hipblas is something different though. I'd say either go balls-to-the-wall and install every hip-sdk use-case from the amdgpu installer, switch to trying out dockers, or swap to 22.04 and try that.

Also, what's the output of rocm-smi? And maybe rocm-smi -a?

sjstulga commented 1 year ago

I've got a 4600G for APU, so it detects 3 GPUs but rocm-smi does not play super nicely with the APU. I did install every use-case from amdgpu installer before I opened the issue. No issues with that installation. I think next step is reformat and upgrade to 22.04

rocm-smi

(textgen) myuser@mymachine:~/text-generation-webui$ rocm-smi 

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
ERROR: GPU[2]   : sclk clock is unsupported
====================================================================================
====================================================================================
GPU[2]          : get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap       VRAM%  GPU%  
0    32.0c           18.0W   938Mhz  350Mhz  14.51%  auto  225.0W         0%   0%    
1    30.0c           15.0W   938Mhz  350Mhz  14.51%  auto  225.0W         0%   0%    
2    27.0c           11.0W   None    None    0%      auto  Unsupported    3%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================

rocm-smi -a

(textgen) myuser@mymachine:~/text-generation-webui$ rocm-smi -a -d 0 1

========================= ROCm System Management Interface =========================
=========================== Version of System Component ============================
Driver version: 6.1.5
====================================================================================
======================================== ID ========================================
GPU[0]          : GPU ID: 0x66a1
GPU[1]          : GPU ID: 0x66a1
====================================================================================
==================================== Unique ID =====================================
GPU[0]          : Unique ID: 0xc04a208172fd5d70
GPU[1]          : Unique ID: 0x72a288a172edb148
====================================================================================
====================================== VBIOS =======================================
GPU[0]          : VBIOS version: 113-D1630600-107
GPU[1]          : VBIOS version: 113-D1631200-107
====================================================================================
=================================== Temperature ====================================
GPU[0]          : Temperature (Sensor edge) (C): 32.0
GPU[0]          : Temperature (Sensor junction) (C): 32.0
GPU[0]          : Temperature (Sensor memory) (C): 31.0
GPU[0]          : Temperature (Sensor HBM 0) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 1) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 2) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 3) (C): 0.0
GPU[1]          : Temperature (Sensor edge) (C): 30.0
GPU[1]          : Temperature (Sensor junction) (C): 31.0
GPU[1]          : Temperature (Sensor memory) (C): 30.0
GPU[1]          : Temperature (Sensor HBM 0) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 1) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 2) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 3) (C): 0.0
====================================================================================
============================ Current clock frequencies =============================
GPU[0]          : dcefclk clock level: 0: (357Mhz)
GPU[0]          : fclk clock level: 0: (550Mhz)
GPU[0]          : mclk clock level: 0: (350Mhz)
GPU[0]          : sclk clock level: 1: (938Mhz)
GPU[0]          : socclk clock level: 0: (309Mhz)
GPU[0]          : pcie clock level: 1 (8.0GT/s x16)
GPU[1]          : dcefclk clock level: 0: (357Mhz)
GPU[1]          : fclk clock level: 0: (550Mhz)
GPU[1]          : mclk clock level: 0: (350Mhz)
GPU[1]          : sclk clock level: 1: (938Mhz)
GPU[1]          : socclk clock level: 0: (309Mhz)
GPU[1]          : pcie clock level: 1 (8.0GT/s x4)
====================================================================================
================================ Current Fan Metric ================================
GPU[0]          : Fan Level: 37 (15%)
GPU[0]          : Fan RPM: 0
GPU[1]          : Fan Level: 37 (15%)
GPU[1]          : Fan RPM: 0
====================================================================================
============================== Show Performance Level ==============================
GPU[0]          : Performance Level: auto
GPU[1]          : Performance Level: auto
====================================================================================
================================= OverDrive Level ==================================
GPU[0]          : GPU OverDrive value (%): 0
GPU[1]          : GPU OverDrive value (%): 0
====================================================================================
================================= OverDrive Level ==================================
GPU[0]          : GPU Memory OverDrive value (%): 0
GPU[1]          : GPU Memory OverDrive value (%): 0
====================================================================================
==================================== Power Cap =====================================
GPU[0]          : Max Graphics Package Power (W): 225.0
GPU[1]          : Max Graphics Package Power (W): 225.0
====================================================================================
=============================== Show Power Profiles ================================
GPU[0]          : 1. Available power profile (#1 of 7): CUSTOM
GPU[0]          : 2. Available power profile (#2 of 7): VIDEO
GPU[0]          : 3. Available power profile (#3 of 7): POWER SAVING
GPU[0]          : 4. Available power profile (#4 of 7): COMPUTE
GPU[0]          : 5. Available power profile (#5 of 7): VR
GPU[0]          : 6. Available power profile (#6 of 7): 3D FULL SCREEN
GPU[0]          : 7. Available power profile (#7 of 7): BOOTUP DEFAULT*
GPU[1]          : 1. Available power profile (#1 of 7): CUSTOM
GPU[1]          : 2. Available power profile (#2 of 7): VIDEO
GPU[1]          : 3. Available power profile (#3 of 7): POWER SAVING
GPU[1]          : 4. Available power profile (#4 of 7): COMPUTE
GPU[1]          : 5. Available power profile (#5 of 7): VR
GPU[1]          : 6. Available power profile (#6 of 7): 3D FULL SCREEN
GPU[1]          : 7. Available power profile (#7 of 7): BOOTUP DEFAULT*
====================================================================================
================================ Power Consumption =================================
GPU[0]          : Average Graphics Package Power (W): 19.0
GPU[1]          : Average Graphics Package Power (W): 15.0
====================================================================================
=========================== Supported clock frequencies ============================
GPU[0]          : Supported dcefclk frequencies on GPU0
GPU[0]          : 0: 357Mhz *
GPU[0]          : 1: 453Mhz
GPU[0]          : 2: 566Mhz
GPU[0]          : 3: 680Mhz
GPU[0]          : 4: 755Mhz
GPU[0]          : 5: 850Mhz
GPU[0]          : 6: 971Mhz
GPU[0]          : 7: 1133Mhz
GPU[0]          : 
GPU[0]          : Supported fclk frequencies on GPU0
GPU[0]          : 0: 550Mhz *
GPU[0]          : 1: 610Mhz
GPU[0]          : 2: 690Mhz
GPU[0]          : 3: 760Mhz
GPU[0]          : 4: 870Mhz
GPU[0]          : 5: 960Mhz
GPU[0]          : 6: 1080Mhz
GPU[0]          : 7: 1180Mhz
GPU[0]          : 
GPU[0]          : Supported mclk frequencies on GPU0
GPU[0]          : 0: 350Mhz *
GPU[0]          : 1: 800Mhz
GPU[0]          : 2: 1000Mhz
GPU[0]          : 
GPU[0]          : Supported sclk frequencies on GPU0
GPU[0]          : 0: 925Mhz
GPU[0]          : 1: 938Mhz *
GPU[0]          : 2: 1076Mhz
GPU[0]          : 3: 1179Mhz
GPU[0]          : 4: 1339Mhz
GPU[0]          : 5: 1461Mhz
GPU[0]          : 6: 1599Mhz
GPU[0]          : 7: 1711Mhz
GPU[0]          : 8: 1800Mhz
GPU[0]          : 
GPU[0]          : Supported socclk frequencies on GPU0
GPU[0]          : 0: 309Mhz *
GPU[0]          : 1: 523Mhz
GPU[0]          : 2: 566Mhz
GPU[0]          : 3: 618Mhz
GPU[0]          : 4: 680Mhz
GPU[0]          : 5: 755Mhz
GPU[0]          : 6: 850Mhz
GPU[0]          : 7: 971Mhz
GPU[0]          : 
GPU[0]          : Supported PCIe frequencies on GPU0
GPU[0]          : 0: 2.5GT/s x16
GPU[0]          : 1: 8.0GT/s x16 *
GPU[0]          : 
------------------------------------------------------------------------------------
GPU[1]          : Supported dcefclk frequencies on GPU1
GPU[1]          : 0: 357Mhz *
GPU[1]          : 1: 453Mhz
GPU[1]          : 2: 566Mhz
GPU[1]          : 3: 680Mhz
GPU[1]          : 4: 755Mhz
GPU[1]          : 5: 850Mhz
GPU[1]          : 6: 971Mhz
GPU[1]          : 7: 1133Mhz
GPU[1]          : 
GPU[1]          : Supported fclk frequencies on GPU1
GPU[1]          : 0: 550Mhz *
GPU[1]          : 1: 610Mhz
GPU[1]          : 2: 690Mhz
GPU[1]          : 3: 760Mhz
GPU[1]          : 4: 870Mhz
GPU[1]          : 5: 960Mhz
GPU[1]          : 6: 1080Mhz
GPU[1]          : 7: 1278Mhz
GPU[1]          : 
GPU[1]          : Supported mclk frequencies on GPU1
GPU[1]          : 0: 350Mhz *
GPU[1]          : 1: 800Mhz
GPU[1]          : 2: 1000Mhz
GPU[1]          : 
GPU[1]          : Supported sclk frequencies on GPU1
GPU[1]          : 0: 925Mhz
GPU[1]          : 1: 938Mhz *
GPU[1]          : 2: 1076Mhz
GPU[1]          : 3: 1179Mhz
GPU[1]          : 4: 1339Mhz
GPU[1]          : 5: 1461Mhz
GPU[1]          : 6: 1599Mhz
GPU[1]          : 7: 1711Mhz
GPU[1]          : 8: 1800Mhz
GPU[1]          : 
GPU[1]          : Supported socclk frequencies on GPU1
GPU[1]          : 0: 309Mhz *
GPU[1]          : 1: 523Mhz
GPU[1]          : 2: 566Mhz
GPU[1]          : 3: 618Mhz
GPU[1]          : 4: 680Mhz
GPU[1]          : 5: 755Mhz
GPU[1]          : 6: 850Mhz
GPU[1]          : 7: 971Mhz
GPU[1]          : 
GPU[1]          : Supported PCIe frequencies on GPU1
GPU[1]          : 0: 2.5GT/s x4
GPU[1]          : 1: 8.0GT/s x4 *
GPU[1]          : 
------------------------------------------------------------------------------------
====================================================================================
================================ % time GPU is busy ================================
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
GPU[1]          : GPU use (%): 0
GPU[1]          : GFX Activity: 0
====================================================================================
================================ Current Memory Use ================================
GPU[0]          : GPU memory use (%): 0
GPU[0]          : Memory Activity: 0
GPU[1]          : GPU memory use (%): 0
GPU[1]          : Memory Activity: 0
====================================================================================
================================== Memory Vendor ===================================
GPU[0]          : GPU memory vendor: samsung
GPU[1]          : GPU memory vendor: hynix
====================================================================================
=============================== PCIe Replay Counter ================================
GPU[0]          : PCIe Replay Count: 0
GPU[1]          : PCIe Replay Count: 0
====================================================================================
================================== Serial Number ===================================
GPU[0]          : Serial Number: PCB026063-0102
GPU[1]          : Serial Number: 692001000098
====================================================================================
================================== KFD Processes ===================================
No KFD PIDs currently running
====================================================================================
=============================== GPUs Indexed by PID ================================
No KFD PIDs currently running
====================================================================================
==================== GPU Memory clock frequencies and voltages =====================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
================================= Current voltage ==================================
GPU[0]          : Voltage (mV): 737
GPU[1]          : Voltage (mV): 737
====================================================================================
==================================== PCI Bus ID ====================================
GPU[0]          : PCI Bus: 0000:03:00.0
GPU[1]          : PCI Bus: 0000:08:00.0
====================================================================================
=============================== Firmware Information ===============================
GPU[0]          : ASD firmware version:         0x210000a7
GPU[0]          : CE firmware version:          80
GPU[0]          : DMCU firmware version:        0
GPU[0]          : MC firmware version:          0
GPU[0]          : ME firmware version:          166
GPU[0]          : MEC firmware version:         469
GPU[0]          : MEC2 firmware version:        469
GPU[0]          : PFP firmware version:         194
GPU[0]          : RLC firmware version:         50
GPU[0]          : RLC SRLC firmware version:    1
GPU[0]          : RLC SRLG firmware version:    1
GPU[0]          : RLC SRLS firmware version:    1
GPU[0]          : SDMA firmware version:        145
GPU[0]          : SDMA2 firmware version:       145
GPU[0]          : SMC firmware version:         00.40.60.00
GPU[0]          : SOS firmware version:         0x00080b67
GPU[0]          : TA RAS firmware version:      27.00.01.43
GPU[0]          : TA XGMI firmware version:     32.00.00.02
GPU[0]          : UVD firmware version:         0x42002b13
GPU[0]          : VCE firmware version:         0x39060400
GPU[0]          : VCN firmware version:         0x00000000
GPU[1]          : ASD firmware version:         0x210000a7
GPU[1]          : CE firmware version:          80
GPU[1]          : DMCU firmware version:        0
GPU[1]          : MC firmware version:          0
GPU[1]          : ME firmware version:          166
GPU[1]          : MEC firmware version:         469
GPU[1]          : MEC2 firmware version:        469
GPU[1]          : PFP firmware version:         194
GPU[1]          : RLC firmware version:         50
GPU[1]          : RLC SRLC firmware version:    1
GPU[1]          : RLC SRLG firmware version:    1
GPU[1]          : RLC SRLS firmware version:    1
GPU[1]          : SDMA firmware version:        145
GPU[1]          : SDMA2 firmware version:       145
GPU[1]          : SMC firmware version:         00.40.60.00
GPU[1]          : SOS firmware version:         0x00080b67
GPU[1]          : TA RAS firmware version:      27.00.01.43
GPU[1]          : TA XGMI firmware version:     32.00.00.02
GPU[1]          : UVD firmware version:         0x42002b13
GPU[1]          : VCE firmware version:         0x39060400
GPU[1]          : VCN firmware version:         0x00000000
====================================================================================
=================================== Product Info ===================================
GPU[0]          : Card series:          TBD VEGA20 CARD
GPU[0]          : Card model:           0x0834
GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]          : Card SKU:             D1630600
GPU[1]          : Card series:           Radeon Instinct MI60 32GB
GPU[1]          : Card model:           0x0834
GPU[1]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]          : Card SKU:             D1631200
====================================================================================
==================================== Pages Info ====================================
====================================================================================
============================== Show Valid sclk Range ===============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
============================== Show Valid mclk Range ===============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
============================= Show Valid voltage Range =============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
=============================== Voltage Curve Points ===============================
GPU[0]          : get_od_volt_info, Not supported on the given system
GPU[1]          : get_od_volt_info, Not supported on the given system
====================================================================================
================================= Consumed Energy ==================================
GPU[0]          : Energy counter: 4294967295
GPU[0]          : Accumulated Energy (uJ): 65713000432.7
GPU[1]          : Energy counter: 4294967295
GPU[1]          : Accumulated Energy (uJ): 65713000432.7
====================================================================================
============================ Current Compute Partition =============================
GPU[0]          : Not supported on the given system
GPU[1]          : Not supported on the given system
====================================================================================
================================= Current NPS Mode =================================
GPU[0]          : Not supported on the given system
GPU[1]          : Not supported on the given system
====================================================================================
=============================== End of ROCm SMI Log ================================

jmoney7823956789378 commented 1 year ago

Hmm, yeah maybe next step will be 22.04, unfortunately. If that doesn't work I'd say you might be getting stuck by the x8/x4 connection, but that could just be me talking out of my ass from my own experience.

btw I noticed you have one mismatched card, exactly like mine. One of them is samsung memory and one is hynix, if I remember correctly. It doesn't change anything architecture-wise, but I just found it interesting that we both ended up with mismatched cards.

sjstulga commented 1 year ago

We could swap so that we'd both have a match! Ha ha :stuck_out_tongue:

Interestingly, one of my MI60s is 1x8-pin PCIe power connector + 1x6-pin PCIe power connector, and the other MI60 is 2x8-pin PCIe power connectors, so they are mismatched in that way too

jmoney7823956789378 commented 1 year ago

We could swap so that we'd both have a match! Ha ha 😛

Interestingly, one of my MI60s is 1x8-pin PCIe power connector + 1x6-pin PCIe power connector, and the other MI60 is 2x8-pin PCIe power connectors, so they are mismatched in that way too

Yep, confirmed we have probably exactly the same mismatched cards, LOL.

sjstulga commented 1 year ago

Out of curiosity, how are you cooling your MI60s? I am cooling using 1 80mm fan per card, rigged up with 3D printed fan shrouds so that they force air through the cards

jmoney7823956789378 commented 1 year ago

Out of curiosity, how are you cooling your MI60s? I am cooling using 1 80mm fan per card, rigged up with 3D printed fan shrouds so that they force air through the cards

3EBFF0DC-9B7F-43A9-A37F-B9E6DC80A1D3

4x monster turbo whiney 40mms.

ardfork commented 1 year ago

Can you try hiding your APU? I only had problems trying to use mine.

HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1

sjstulga commented 1 year ago

Can you try hiding your APU? I only had problems trying to use mine.

HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1

I already have these environment variables set. I've run with them as part of the command line as well and the result is identical to above

jmoney7823956789378 commented 1 year ago

Did you try a docker yet? I'm not sure if it will help, but here's what I used to set up my docker container for oobabooga before.

https://github.com/evshiron/rocm_lab

Engininja2 commented 1 year ago

What's the output of ldd torch/lib/libtorch_hip.so | grep hipblas?

sjstulga commented 1 year ago

What's the output of ldd torch/lib/libtorch_hip.so | grep hipblas?

I have been procrastinating reformatting, so I can still tell you this! No output.

Here is without grep

(textgen) myuser@mymachine:~/pytorch/torch/lib$ ldd libtorch_hip.so 
        linux-vdso.so.1 (0x00007ffe3c368000)
        libc10_hip.so => /home/myuser/pytorch/torch/lib/./libc10_hip.so (0x00007f39c5c79000)
        libamdhip64.so.5 => /opt/rocm/hip/lib/libamdhip64.so.5 (0x00007f39c41b3000)
        libMIOpen.so.1 => /opt/rocm-5.6.0/lib/libMIOpen.so.1 (0x00007f39a7c3f000)
        libroctx64.so.4 => /opt/rocm-5.6.0/lib/libroctx64.so.4 (0x00007f39a7c3a000)
        librocblas.so.3 => /opt/rocm-5.6.0/lib/librocblas.so.3 (0x00007f3990bdc000)
        libhipfft.so => /opt/rocm-5.6.0/lib/libhipfft.so (0x00007f3990bce000)
        libhiprand.so.1 => /opt/rocm-5.6.0/lib/libhiprand.so.1 (0x00007f3990bc8000)
        libhipsparse.so.0 => /opt/rocm-5.6.0/lib/libhipsparse.so.0 (0x00007f3990b8f000)
        librccl.so.1 => /opt/rocm-5.6.0/lib/librccl.so.1 (0x00007f3980001000)
        libc10.so => /home/myuser/pytorch/torch/lib/./libc10.so (0x00007f397ff65000)
        libtorch_cpu.so => /home/myuser/pytorch/torch/lib/./libtorch_cpu.so (0x00007f3974ca6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3974c6b000)
        libstdc++.so.6 => /home/myuser/miniconda3/envs/textgen/lib/libstdc++.so.6 (0x00007f3974a57000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3974908000)
        libgcc_s.so.1 => /home/myuser/miniconda3/envs/textgen/lib/libgcc_s.so.1 (0x00007f39748ee000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f39746fc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f39d7abd000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f39746f0000)
        libamd_comgr.so.2 => /opt/rocm-5.6.0/lib/libamd_comgr.so.2 (0x00007f396b699000)
        libhsa-runtime64.so.1 => /opt/rocm-5.6.0/lib/libhsa-runtime64.so.1 (0x00007f396b3e8000)
        libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f396b3db000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f396b3d5000)
        librocfft.so.0 => /opt/rocm-5.6.0/lib/librocfft.so.0 (0x00007f396ad5e000)
        librocrand.so.1 => /opt/rocm-5.6.0/lib/librocrand.so.1 (0x00007f39681be000)
        librocsparse.so.0 => /opt/rocm-5.6.0/lib/librocsparse.so.0 (0x00007f3936819000)
        librocm_smi64.so.5 => /opt/rocm-5.6.0/lib/librocm_smi64.so.5 (0x00007f393676b000)
        libmkl_intel_lp64.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_intel_lp64.so.2 (0x00007f393547b000)
        libmkl_gnu_thread.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_gnu_thread.so.2 (0x00007f39337f9000)
        libmkl_core.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_core.so.2 (0x00007f392f327000)
        libgomp.so.1 => /home/myuser/miniconda3/envs/textgen/lib/libgomp.so.1 (0x00007f392f2e3000)
        libroctracer64.so.4 => /opt/rocm/lib/libroctracer64.so.4 (0x00007f392f285000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f392f269000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f392f237000)
        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007f392f21b000)
        libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f392f202000)
        libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007f392f1f4000)
        libhiprtc.so.5 => /opt/rocm-5.6.0/lib/libhiprtc.so.5 (0x00007f392f135000)

Engininja2 commented 1 year ago

Did you build Pytorch with USE_FBGEMM=OFF?

Maybe exllama should start linking to hipblas directly. It looks like the only part of torch itself that needs hipblas is FBGEMM and that's both optional and doesn't get built for x86 32bit.

Could you try this change to the text-generation-webui/repositories version and see if it works? Engininja2/exllama@bb3473e

sjstulga commented 1 year ago

Did you build Pytorch with USE_FBGEMM=OFF?

I built Pytorch by cloning the official github and following only the steps specified in their README for building from source. I definitely didn't explicitly set this environment variable, but I'm not sure if it is on or off by default.

Could you try this change to the text-generation-webui/repositories version and see if it works? https://github.com/Engininja2/exllama/commit/bb3473e

I think this has resolved the original issue! Here is my latest run with output. Not a final success, but definitely good progress, and maybe we are at the point where I should close this issue and think about opening a new one?

(textgen) myuser@mymachine:~/text-generation-webui$ HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1 python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api
2023-07-16 14:12:29 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-16 14:12:30 WARNING:Exllama module failed to load. Will attempt to load from repositories.
Successfully preprocessed all matching files.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 298, in ExLlama_loader
    model, tokenizer = ExllamaModel.from_pretrained(model_name)
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 67, in from_pretrained
    model = ExLlama(config)
  File "/home/myuser/text-generation-webui/repositories/exllama/model.py", line 788, in __init__
    inv_freq = 1.0 / (self.config.rotary_embedding_base ** (torch.arange(0, self.config.head_dim, 2, device = device).float() / self.config.head_dim))
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

I managed to catch the rocm-smi output a moment before the text-generation-ui server bombed out, so I can confirm that the model did load into VRAM across both of the cards!

(textgen) myuser@mymachine:~$ rocm-smi 

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
ERROR: GPU[2]   : sclk clock is unsupported
====================================================================================
====================================================================================
GPU[2]          : get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap       VRAM%  GPU%  
0    30.0c           19.0W   925Mhz  350Mhz  14.51%  auto  225.0W        63%   0%    
1    30.0c           15.0W   925Mhz  350Mhz  14.51%  auto  225.0W        51%   0%    
2    28.0c           12.0W   None    None    0%      auto  Unsupported    3%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================

cebtenzzre commented 1 year ago

I installed python-pytorch-opt-rocm on Arch Linux and also needed the explicit -lhipblas.

jmoney7823956789378 commented 1 year ago

@sjstulga not sure if you're still having issues, but I wanted to point out that I was using CUDA_VISIBLE_DEVICES=0 (or 1) even when using my MI60s. I saw you have that in there by other names, but maybe. Also the device-side assertions error happens a lot in CUDA too. I haven't found a reliable fix outside of turning it off and back on...

turboderp / exllama

ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154