Closed HDCharles closed 3 months ago
Stack from ghstack (oldest at bottom):
Summary: redoing https://github.com/pytorch-labs/gpt-fast/commit/5bf70c114088a5133299609694a8c17b37de69c4 in a way that doesn't get reverted. note, needed to fix a device issue as well.
Test Plan:
export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5
Reviewers:
Subscribers:
Tasks:
Tags:
Stack from ghstack (oldest at bottom):
147
Summary: redoing https://github.com/pytorch-labs/gpt-fast/commit/5bf70c114088a5133299609694a8c17b37de69c4 in a way that doesn't get reverted. note, needed to fix a device issue as well.
Test Plan:
export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5
Reviewers:
Subscribers:
Tasks:
Tags: