juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models
GNU General Public License v3.0
2.8k stars 312 forks source link

Does this include the GPTQ quantization tricks? #94

Open vedantroy opened 1 year ago

vedantroy commented 1 year ago

The GPTQ readme has the following:

which demonstrates two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.

Does this repository use these tricks?