654 tried to enable GPTQ quantization with fine-tuned LLaMA2 models, but was closed.
I tried following a similar approach as that PR, but for fine-tuned LLaMA3 models.
It did not succeed because LLaMA3 uses the TikToken tokenizer, whose encode()has two more arguments than this call to encode() in torchao. This wasn't an issue with LLaMA2, which used the sentencepiece tokenizer.
Problem Description
Hi,
654 tried to enable GPTQ quantization with fine-tuned LLaMA2 models, but was closed.
I tried following a similar approach as that PR, but for fine-tuned LLaMA3 models. It did not succeed because LLaMA3 uses the
TikToken
tokenizer, whoseencode()
has two more arguments than this call toencode()
intorchao
. This wasn't an issue with LLaMA2, which used thesentencepiece
tokenizer.Thanks in advance for your help!