turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.76k stars 220 forks source link

Slowdown again with pascal cards. #156

Open ENjoyBlue2021 opened 1 year ago

ENjoyBlue2021 commented 1 year ago

I couldn't reopen my original issue so I hope its fine if I open another bug. The pascal fix is broken again, at least for me. The following check does not work:

q4_matmul.cu:

if defined(CUDA_ARCH) && CUDA_ARCH < 700

const float alpha = 1.0f;
const float beta = no_zero ? 1.0f : 0.0f;
cublasSgemmEx(handle, CUBLAS_OP_N, CUBLAS_OP_N, width, height, dim, &alpha, buffers->temp_dq, CUDA_R_16F, width,
              x_mapped, CUDA_R_16F, dim, &beta, out, CUDA_R_16F, width);

else

const half alpha = __float2half(1.0f);
const half beta = no_zero ? __float2half(1.0f) : __float2half(0.0f);
cublasHgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, width, height, dim, &alpha, buffers->temp_dq, width, x_mapped, dim, &beta, out, width);

endif

Taking out the if and just set SgemmEx works. This is on dual gpu 1080ti + 1080.

turboderp commented 1 year ago

This sounds like CUDA_ARCH is either undefined or defined incorrectly. Could you try changing the first line to just:

#if CUDA_ARCH < 700

That should fail to compile if the symbol is missing. If it compiles it means you've got it incorrectly defined, somehow. Which I guess would suggest a (very strange) driver issue...?

ENjoyBlue2021 commented 1 year ago

Hmm, you are correct.

Putting a

if (CUDA_ARCH < 700)
{

}

Gives me an error.

/media/w/PhoenixSSD/oobabooga/miniconda/envs/textgen/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options ‘-fPIC’ -lineinfo -std=c++17 -c /media/w/PhoenixSSD/oobabooga/text-generation-webui/repositories/exllama/exllama_ext/cuda_func/q4_matmul.cu -o q4_matmul.cuda.o /media/w/PhoenixSSD/oobabooga/text-generation-webui/repositories/exllama/exllama_ext/cuda_func/q4_matmul.cu(247): error: identifier “CUDA_ARCH” is undefined

I'm on ubuntu and really don't want to mess too much with the nvidia drivers. Very much possible that its something on my end.

That would be my nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   59C    P0    66W / 210W |   1399MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:02:00.0 Off |                  N/A |
|  0%   41C    P8     9W / 200W |      9MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
Ph0rk0z commented 1 year ago

Your driver is probably fine. It's the venv. I too have cuda 12 on the system and then cuda 11.8 in the venv. I had to download all the cuda toolkit stuff again for that. Conda was actually useful.

ENjoyBlue2021 commented 1 year ago

I think you are right, I played around a couple of hours trying to uninstall the old version. I reinstalled venv cuda toolkit 11.8 but that didn't fix anything.

I can't seem to be able to properly remove the toolkit drivers from my system before installing another version. I suppose I need to install 11.8 like in the venv but all my attempts to clean up and purge the current version failed. So I'm giving up, this is the only area thats causing problems for me anyway.

Ph0rk0z commented 1 year ago

This is why I like conda. A fresh environment with new cu118 torch and reqs usually fixes things. Although I've yet to mess up a single conda env or venv, knock on wood.