Open AceBeaker2 opened 1 year ago
It might have something to do with https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/main/quant_cuda.cpp
try pip install gptq -U
or python3.10 -m pip install gptq -U
?
I found the doc here: https://pypi.org/project/gptq/
running on ubuntu 22 environment through ssh, and gptq is installed perfectly fine
Upgrading from gptq 0.0.2 to gptq 0.0.3 resolved this problem for me.
Using python 3.10.
From my inept sleuthing it looks to me that this is caused by quant_cuda.cpp
in gptq
not being compiled/bound into a python importable file when installing gptq
. I personally cant get past this error either on windows.
From my inept sleuthing it looks to me that this is caused by
quant_cuda.cpp
ingptq
not being compiled/bound into a python importable file when installinggptq
. I personally cant get past this error either on windows.
Exactly! Same exact problem I'm having, but on Linux
I'm wondering if using https://github.com/qwopqwop200/GPTQ-for-LLaMa might work instead of https://github.com/IST-DASLab/gptq?
Though looking at https://github.com/IST-DASLab/gptq/blob/main/setup_cuda.py it seems that that file is missing from the gptq
installation I got from pip, and it seems to set up the quant_cuda
part of the module. so I think maybe installing from pip might be the problem?
Hi @tarpeyd12 , Have you found a way to resolve this issue? I have the same issue on windows. I am getting "error: Microsoft Visual C++ 14.0 or greater is required." This error, but I have all the C++ stuff installed and added to PATH. The weird thing for me is that the code worked for me 2 days ago but I accidently messed up my conda env so I reinstalled it, after which I can never get it to work again. It's really frustrating.
I think this might work, although I am using Google Colab to download the weights. Here is what I did:
Make sure your runtime/machine has access to a CUDA GPU. Then, put these commands into a cell and run them in order to install pyllama
and gptq
:
!pip install pyllama
!pip install gptq
After that, simply run the following command:
!python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
peace out ;)
Hi All, anyone can solve the issue?
I have tried to install pyllama and gptq, but it doesn't work. the version of python3 is 3.10
Hey everyone, I've thoroughly followed the proper installation of gptq as well from here https://pypi.org/project/gptq/
I'm still facing the issue. Has anyone found a solution yet?
Hello everyone, I've solved this problem recently.
The reason this error occurs is to import quant_cuda.cpp as Python module in "...site-packages/gptq/quant.py" file. In gptq folder, quant_cuda file is exist as cpp file. quant_cuda.cpp exposes functions as python interface using PYBIND11 library.
//quant_cuda.cpp ..... PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("matvmul2", &vecquant2matmul, "2-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul3", &vecquant3matmul, "3-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul4", &vecquant4matmul, "4-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul8", &vecquant8matmul, "8-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul16", &vecquant8matmul, "16-bit Quantized Matrix Vector Multiplication (CUDA)"); }
and
//quant.py ...... from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8 ......
But we can't import and use above functions in quant.py file.
So I modified quant.py file as follow. //quant.py import numpy as np import torch // _from quantcuda import matvmul2, matvmul3, matvmul4, matvmul8 from torch.utils.cpp_extension import CppExtension
quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])
..... if self.bits == 2: //matvmul2(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul2(x, self.qweight, y, self.scales, self.zeros) elif self.bits == 3: //matvmul3(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul3(x, self.qweight, y, self.scales, self.zeros) .....
In this file, I disabled codes used matvmulX functions already exist using '//' symbol.
And then, our error is disappeared!
I tried this to fix missing module error:
anaconda3/lib/python3.11/site-packages/gptq
folderpython setup_cuda.py install
I am still trying to go forward now, so not sure this is the correct way
Hello everyone, I've solved this problem recently.
The reason this error occurs is to import quant_cuda.cpp as Python module in "...site-packages/gptq/quant.py" file. In gptq folder, quant_cuda file is exist as cpp file. quant_cuda.cpp exposes functions as python interface using PYBIND11 library.
//quant_cuda.cpp ..... PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("matvmul2", &vecquant2matmul, "2-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul3", &vecquant3matmul, "3-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul4", &vecquant4matmul, "4-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul8", &vecquant8matmul, "8-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul16", &vecquant8matmul, "16-bit Quantized Matrix Vector Multiplication (CUDA)"); }
and
//quant.py ...... from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8 ......
But we can't import and use above functions in quant.py file.
So I modified quant.py file as follow. //quant.py import numpy as np import torch // _from quantcuda import matvmul2, matvmul3, matvmul4, matvmul8 from torch.utils.cpp_extension import CppExtension
quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])
..... if self.bits == 2: //matvmul2(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul2(x, self.qweight, y, self.scales, self.zeros) elif self.bits == 3: //matvmul3(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul3(x, self.qweight, y, self.scales, self.zeros) .....
In this file, I disabled codes used matvmulX functions already exist using '//' symbol.
And then, our error is disappeared!
You should do the same with "...site-packages/gptq/init.py"
#from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8
from torch.utils.cpp_extension import CppExtension
quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])
worked for me
I can't find it whatsoever online, no idea whats going on: