int4 gptq shape fix - Githubissues

Stack from ghstack (oldest at bottom):

147
-> #142

Summary: redoing https://github.com/pytorch-labs/gpt-fast/commit/5bf70c114088a5133299609694a8c17b37de69c4 in a way that doesn't get reverted. note, needed to fix a device issue as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-labs / gpt-fast

int4 gptq shape fix #142

147