Open ENjoyBlue2021 opened 1 year ago
This sounds like CUDA_ARCH
is either undefined or defined incorrectly. Could you try changing the first line to just:
#if CUDA_ARCH < 700
That should fail to compile if the symbol is missing. If it compiles it means you've got it incorrectly defined, somehow. Which I guess would suggest a (very strange) driver issue...?
Hmm, you are correct.
Putting a
if (CUDA_ARCH < 700) { }
Gives me an error.
/media/w/PhoenixSSD/oobabooga/miniconda/envs/textgen/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options ‘-fPIC’ -lineinfo -std=c++17 -c /media/w/PhoenixSSD/oobabooga/text-generation-webui/repositories/exllama/exllama_ext/cuda_func/q4_matmul.cu -o q4_matmul.cuda.o /media/w/PhoenixSSD/oobabooga/text-generation-webui/repositories/exllama/exllama_ext/cuda_func/q4_matmul.cu(247): error: identifier “CUDA_ARCH” is undefined
I'm on ubuntu and really don't want to mess too much with the nvidia drivers. Very much possible that its something on my end.
That would be my nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 0% 59C P0 66W / 210W | 1399MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| 0% 41C P8 9W / 200W | 9MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Your driver is probably fine. It's the venv. I too have cuda 12 on the system and then cuda 11.8 in the venv. I had to download all the cuda toolkit stuff again for that. Conda was actually useful.
I think you are right, I played around a couple of hours trying to uninstall the old version. I reinstalled venv cuda toolkit 11.8 but that didn't fix anything.
I can't seem to be able to properly remove the toolkit drivers from my system before installing another version. I suppose I need to install 11.8 like in the venv but all my attempts to clean up and purge the current version failed. So I'm giving up, this is the only area thats causing problems for me anyway.
This is why I like conda. A fresh environment with new cu118 torch and reqs usually fixes things. Although I've yet to mess up a single conda env or venv, knock on wood.
I couldn't reopen my original issue so I hope its fine if I open another bug. The pascal fix is broken again, at least for me. The following check does not work:
q4_matmul.cu:
Taking out the if and just set SgemmEx works. This is on dual gpu 1080ti + 1080.