ModuleNotFoundError: No module named 'quant_cuda'

AceBeaker2 commented 1 year ago

Traceback (most recent call last):
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/llama/llama_quant.py", line 6, in <module>
    from gptq import (
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/__init__.py", line 9, in <module>
    from .gptq import GPTQ
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/gptq.py", line 5, in <module>
    from .quant import quantize
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/quant.py", line 4, in <module>
    from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16
ModuleNotFoundError: No module named 'quant_cuda'

I can't find it whatsoever online, no idea whats going on:

$ nvidia-smi
Sat Mar 18 19:49:11 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti      On | 00000000:09:00.0 Off |                  N/A |
|  0%   45C    P8               23W / 200W|     64MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1245      G   /usr/lib/xorg/Xorg                           56MiB |
|    0   N/A  N/A      1437      G   /usr/bin/gnome-shell                          6MiB |
+---------------------------------------------------------------------------------------+

AceBeaker2 commented 1 year ago

It might have something to do with https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/main/quant_cuda.cpp

shadowwalker2718 commented 1 year ago

try pip install gptq -U or python3.10 -m pip install gptq -U?

shadowwalker2718 commented 1 year ago

I found the doc here: https://pypi.org/project/gptq/

AceBeaker2 commented 1 year ago

running on ubuntu 22 environment through ssh, and gptq is installed perfectly fine

dewes commented 1 year ago

Upgrading from gptq 0.0.2 to gptq 0.0.3 resolved this problem for me.

Using python 3.10.

tarpeyd12 commented 1 year ago

From my inept sleuthing it looks to me that this is caused by quant_cuda.cpp in gptq not being compiled/bound into a python importable file when installing gptq. I personally cant get past this error either on windows.

AceBeaker2 commented 1 year ago

From my inept sleuthing it looks to me that this is caused by quant_cuda.cpp in gptq not being compiled/bound into a python importable file when installing gptq. I personally cant get past this error either on windows.

Exactly! Same exact problem I'm having, but on Linux

tarpeyd12 commented 1 year ago

I'm wondering if using https://github.com/qwopqwop200/GPTQ-for-LLaMa might work instead of https://github.com/IST-DASLab/gptq?

Though looking at https://github.com/IST-DASLab/gptq/blob/main/setup_cuda.py it seems that that file is missing from the gptq installation I got from pip, and it seems to set up the quant_cuda part of the module. so I think maybe installing from pip might be the problem?

LaTournesol commented 1 year ago

Hi @tarpeyd12 , Have you found a way to resolve this issue? I have the same issue on windows. I am getting "error: Microsoft Visual C++ 14.0 or greater is required." This error, but I have all the C++ stuff installed and added to PATH. The weird thing for me is that the code worked for me 2 days ago but I accidently messed up my conda env so I reinstalled it, after which I can never get it to work again. It's really frustrating.

mhyrzt commented 1 year ago

I think this might work, although I am using Google Colab to download the weights. Here is what I did:

Make sure your runtime/machine has access to a CUDA GPU. Then, put these commands into a cell and run them in order to install pyllama and gptq:

!pip install pyllama
!pip install gptq

After that, simply run the following command:

!python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

peace out ;)

mathppp commented 1 year ago

Hi All, anyone can solve the issue?

I have tried to install pyllama and gptq, but it doesn't work. the version of python3 is 3.10

Chirag-Mphasis commented 1 year ago

Hey everyone, I've thoroughly followed the proper installation of gptq as well from here https://pypi.org/project/gptq/

I'm still facing the issue. Has anyone found a solution yet?

C25Ronaldo commented 1 year ago

Hello everyone, I've solved this problem recently.

The reason this error occurs is to import quant_cuda.cpp as Python module in "...site-packages/gptq/quant.py" file. In gptq folder, quant_cuda file is exist as cpp file. quant_cuda.cpp exposes functions as python interface using PYBIND11 library.

//quant_cuda.cpp ..... PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("matvmul2", &vecquant2matmul, "2-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul3", &vecquant3matmul, "3-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul4", &vecquant4matmul, "4-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul8", &vecquant8matmul, "8-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul16", &vecquant8matmul, "16-bit Quantized Matrix Vector Multiplication (CUDA)"); }

and

//quant.py ...... from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8 ......

But we can't import and use above functions in quant.py file.

So I modified quant.py file as follow. //quant.py import numpy as np import torch // _from quantcuda import matvmul2, matvmul3, matvmul4, matvmul8 from torch.utils.cpp_extension import CppExtension

quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])

..... if self.bits == 2: //matvmul2(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul2(x, self.qweight, y, self.scales, self.zeros) elif self.bits == 3: //matvmul3(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul3(x, self.qweight, y, self.scales, self.zeros) .....

In this file, I disabled codes used matvmulX functions already exist using '//' symbol.

And then, our error is disappeared!

CoderChang65535 commented 1 year ago

I tried this to fix missing module error:

Go to anaconda3/lib/python3.11/site-packages/gptq folder
Create the same file in this folder: setup_cuda.py
Run python setup_cuda.py install

I am still trying to go forward now, so not sure this is the correct way

Ndron commented 1 year ago

Hello everyone, I've solved this problem recently.

The reason this error occurs is to import quant_cuda.cpp as Python module in "...site-packages/gptq/quant.py" file. In gptq folder, quant_cuda file is exist as cpp file. quant_cuda.cpp exposes functions as python interface using PYBIND11 library.

//quant_cuda.cpp ..... PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("matvmul2", &vecquant2matmul, "2-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul3", &vecquant3matmul, "3-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul4", &vecquant4matmul, "4-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul8", &vecquant8matmul, "8-bit Quantized Matrix Vector Multiplication (CUDA)"); m.def("matvmul16", &vecquant8matmul, "16-bit Quantized Matrix Vector Multiplication (CUDA)"); }

and

//quant.py ...... from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8 ......

But we can't import and use above functions in quant.py file.

So I modified quant.py file as follow. //quant.py import numpy as np import torch // _from quantcuda import matvmul2, matvmul3, matvmul4, matvmul8 from torch.utils.cpp_extension import CppExtension

quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])

..... if self.bits == 2: //matvmul2(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul2(x, self.qweight, y, self.scales, self.zeros) elif self.bits == 3: //matvmul3(x, self.qweight, y, self.scales, self.zeros) quant_cuda.matvmul3(x, self.qweight, y, self.scales, self.zeros) .....

In this file, I disabled codes used matvmulX functions already exist using '//' symbol.

And then, our error is disappeared!

You should do the same with "...site-packages/gptq/init.py"

#from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8
from torch.utils.cpp_extension import CppExtension
quant_cuda = CppExtension(name='quant_cuda', sources=['quant_cuda.cpp'])

worked for me

juncongmoo / pyllama

ModuleNotFoundError: No module named 'quant_cuda' #37