Any other loaders loads fine on loading and inference
2023-12-27 22:03:03 INFO:Loading TheBloke_Nous-Hermes-13B-GPTQ...
Successfully preprocessed all matching files.
2023-12-27 22:03:04 WARNING:You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features
2023-12-27 22:03:05 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/modules/models.py", line 88, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/modules/models.py", line 398, in ExLlamav2_loader
model, tokenizer = Exllamav2Model.from_pretrained(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/modules/exllamav2.py", line 58, in from_pretrained
model.load(split)
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 239, in load
for item in f: return item
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 258, in load_gen
module.load()
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/attn.py", line 78, in load
self.input_layernorm.load()
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/rmsnorm.py", line 23, in load
w = self.load_weight()
^^^^^^^^^^^^^^^^^^
File "/media/10TB_HHD/_OOBAGOOBA-AMD_V2/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/module.py", line 94, in load_weight
tensor = tensor.half()
^^^^^^^^^^^^^
RuntimeError: HIP error: the operation cannot be performed in the present state
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Any other loaders loads fine on loading and inference