qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.99k stars 459 forks source link

Add -O3 flag to nvcc #235

Closed Noir-Lime closed 1 year ago

Noir-Lime commented 1 year ago

I noticed that nvcc isn't being configured to compile with any configuration options. I've added -O3 as a start.

I've noticed a gain from ~ 42 T / s to ~ 47 T / s running this model: https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ on a RTX 3090

I invite others to test different nvcc optimization options to see if better performance can be achieved.

Dessix commented 1 year ago

Supposedly this is a no-op, as -O3 is the default.