Closed jerryzh168 closed 2 months ago
Stack from ghstack (oldest at bottom):
Summary: att
Adding this for accuracy evaluation, we also added this in executorch repo and we'll dedup later
Test Plan:
quantization:
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode 8da4w-gptq --calibration_tasks wikitext --calibration_limit 5
this finished in 20+ min in my machine if you change calibration_limit to 1, then it can be finished in 10+ min, but expect worse quality since we do less calibration (use this for debugging a new quantization experiment)
evaluation:
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_8da4w-gptq.g32.pth --tasks wikitext
This should be fast, the result I'm getting is:
wikitext: {'word_perplexity,none': 10.15655335078972, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.5726497149737177, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6531973670369153, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
Reviewers:
Subscribers:
Tasks:
Tags:
we're going to add this to torchao instead
Stack from ghstack (oldest at bottom):
Summary: att
Adding this for accuracy evaluation, we also added this in executorch repo and we'll dedup later
Test Plan:
quantization:
this finished in 20+ min in my machine if you change calibration_limit to 1, then it can be finished in 10+ min, but expect worse quality since we do less calibration (use this for debugging a new quantization experiment)
evaluation:
This should be fast, the result I'm getting is:
Reviewers:
Subscribers:
Tasks:
Tags: