qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.98k stars 457 forks source link

the inference speed of GPTQ 4bit quantized model #252

Open pineking opened 1 year ago

pineking commented 1 year ago

does someone have compared the inference speed of 4bit quantized model with the origin FP16 model? is it faster than the origin FP16 model?

ftgreat commented 1 year ago

I tested, but int4 costs 2 time of FP16. Anything wrong?

CSEEduanyu commented 10 months ago

I tested, but int4 costs 2 time of FP16. Anything wrong?

same,do you know why?