pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
BSD 3-Clause "New" or "Revised" License
3.55k stars 290 forks source link

GPTQ quantization not working with fine-tuned LLaMA3 models #1033

Open sanchitintel opened 1 month ago

sanchitintel commented 1 month ago

Problem Description

Hi,

654 tried to enable GPTQ quantization with fine-tuned LLaMA2 models, but was closed.

I tried following a similar approach as that PR, but for fine-tuned LLaMA3 models. It did not succeed because LLaMA3 uses the TikToken tokenizer, whose encode() has two more arguments than this call to encode() in torchao. This wasn't an issue with LLaMA2, which used the sentencepiece tokenizer.

Thanks in advance for your help!

sanchitintel commented 1 month ago

@HDCharles would help fix it in torchao by making GPTQ compatible with TikToken. Thanks!