I don't know if the error I'm getting is related to GPTQ-for-LLaMa, but is worth a try! This is the scenario:
Linux Mint based ubuntu focal
2x AMD RX5700XT
I'm using the oobabooga_linux on-click installer even if I know is not supported, I will attach my start_linux.sh and webui.py as they're modified to fit the purpose.
Using rocm-5.4.2, installed packages: torch torchvision torchaudio pytorch-triton-rocm
Installing bitsandbytes from here https://github.com/agrocylo/bitsandbytes-rocm.git
and of course the triton branch of this repo!
Now doesn't matter which GPTQ model I try to run I get always the same error:
Gradio HTTP request redirected to localhost :)
/media/muflo/Volume10TB/AI/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
The following models are available:
1. eachadea_ggml-vicuna-7b-1.1
2. eachadea_vicuna-7b-1.1
3. EleutherAI_pythia-6.9b-deduped
4. facebook_opt-1.3b
5. TheBloke_vicuna-7B-GPTQ-4bit-128g
6. TheBloke_vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g
7. TheYuriLover_llama-13b-pretrained-sft-do2-4bit-128g
Which one do you want to load? 1-7
7
Loading TheYuriLover_llama-13b-pretrained-sft-do2-4bit-128g...
Found the following quantized model: models/TheYuriLover_llama-13b-pretrained-sft-do2-4bit-128g/llama-13b-pretrained-sft-do2-4bit-128g.safetensors
Loading model ...
The safetensors archive passed at models/TheYuriLover_llama-13b-pretrained-sft-do2-4bit-128g/llama-13b-pretrained-sft-do2-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Found 3 unique KN Linear values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<string>", line 21, in matmul_248_kernel
KeyError: ('2-.-0-.-0--315d85ef1dfc7d1a34d7a2447be14452-d3d1a24a1000d576219bf70a59497878-acfe913a6e9f1719e65278db70e6193a-b7dcc787308646f52b470de5b5172fda-e1f133f98d04093da2078dfc51c36b72-5b0ee5cc97b40d70a6cbd97988cca4ae-0db1785b8dc43452c61ef6d926ec11bb-72c9e4c5c1a79179d65d3e9ae662c15b', (torch.float16, torch.int32, torch.float16, torch.float16, torch.int32, torch.int32, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (16, 256, 32, 8), (True, True, True, True, True, True, (False, True), (True, False), (True, False), (False, False), (False, False), (True, False), (False, True), (True, False), (False, True), (True, False), (False, True), (True, False), (True, False)), 4, 4)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/server.py", line 912, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/modules/models.py", line 127, in load_model
model = load_quantized(model_name)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/modules/GPTQ_loader.py", line 173, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference_offload.py", line 221, in load_quant
quant.autotune_warmup_linear(model)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 419, in autotune_warmup_linear
matmul248(a, qweight, scales, qzeros, g_idx, bits, maxq)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/quant_linear.py", line 267, in matmul248
matmul_248_kernel[grid](input, qweight, output, scales, qzeros, g_idx, input.shape[0], qweight.shape[1], input.shape[1], bits, maxq, input.stride(0), input.stride(1), qweight.stride(0),
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in run
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 90, in <dictcomp>
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 72, in _bench
return triton.testing.do_bench(kernel_call, percentiles=(0.5, 0.2, 0.8), rep=40)
File "/media/muflo/Volume10TB/AI/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/triton/testing.py", line 146, in do_bench
fn()
File "/media/muflo/Volume10TB/AI/oobabooga_linux/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/custom_autotune.py", line 67, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "<string>", line 43, in matmul_248_kernel
File "/media/muflo/Volume10TB/AI/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/triton/compiler.py", line 1944, in __getattribute__
self._init_handles()
File "/media/muflo/Volume10TB/AI/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/triton/compiler.py", line 1930, in _init_handles
mod, func, n_regs, n_spills = hip_utils.load_binary(self.metadata["name"], self.asm["hsaco_path"], self.shared, device)
SystemError: <built-in function load_binary> returned NULL without setting an exception
Done!
Am I doing something wrong or I'm just trying to do something that is impossible?
P.S.: One thing that I noticed is that it seems to be missing a closing brachet! It's immediately after this "(16, 256, 32, 8)"
I don't know if the error I'm getting is related to GPTQ-for-LLaMa, but is worth a try! This is the scenario: Linux Mint based ubuntu focal 2x AMD RX5700XT I'm using the oobabooga_linux on-click installer even if I know is not supported, I will attach my start_linux.sh and webui.py as they're modified to fit the purpose. Using rocm-5.4.2, installed packages: torch torchvision torchaudio pytorch-triton-rocm Installing bitsandbytes from here https://github.com/agrocylo/bitsandbytes-rocm.git and of course the triton branch of this repo!
Now doesn't matter which GPTQ model I try to run I get always the same error:
Am I doing something wrong or I'm just trying to do something that is impossible? P.S.: One thing that I noticed is that it seems to be missing a closing brachet! It's immediately after this "(16, 256, 32, 8)"
webui.py.txt start_linux.sh.txt