turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

HIP kernel errors #328

Open userbox020 opened 4 months ago

userbox020 commented 4 months ago

Im using rocm5.6 and installing enviroment with ooba one click install and im getting the follow error when loading models

Traceback (most recent call last):
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/ui_model_menu.py", line 213, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 389, in ExLlamav2_HF_loader
    return Exllamav2HF.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 170, in from_pretrained
    return Exllamav2HF(config)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 44, in __init__
    self.ex_model.load(split)
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

However can run llamacpp models on same gpu and environment without any errors

SinanAkkoyun commented 4 months ago

I get a similar error when using the prebuilt rocm wheel exllamav2-0.0.13.post1+rocm5.6-cp311-cp311-linux_x86_64.whl

ROCR_VISIBLE_DEVICES=1 python examples/chat.py -m ../../../models/exl2/tinyllama-1B-4.0bpw -mode llama                                             (exl2) 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
 -- Model: ../../../models/exl2/tinyllama-1B-4.0bpw
 -- Options: []
 -- Loading model...
Traceback (most recent call last):
  File "/home/sinan/ml/llm/inference/exl2/exllamav2/examples/chat.py", line 87, in <module>
    model, tokenizer = model_init.init(args, allow_auto_split = True)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model_init.py", line 101, in init
    model.load(split)
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
userbox020 commented 2 months ago

@SinanAkkoyun I think new mesa drivers 24.1 solve the issue, havent check yet

turboderp commented 2 weeks ago

Any updates?