pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.34k stars 484 forks source link

int4 gptq shape fix #142

Closed HDCharles closed 3 months ago

HDCharles commented 3 months ago

Stack from ghstack (oldest at bottom):

Summary: redoing https://github.com/pytorch-labs/gpt-fast/commit/5bf70c114088a5133299609694a8c17b37de69c4 in a way that doesn't get reverted. note, needed to fix a device issue as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags: