pytorch / ao

PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
1.53k stars 159 forks source link

Create a quant_utils file to reduce code duplication in eval.py and generate.py #992

Open jerryzh168 opened 1 month ago

jerryzh168 commented 1 month ago

some duplication in https://github.com/pytorch/ao/blob/378e6a8d6854d77efba45fcb1a4091724e9cfaa9/torchao/_models/llama/generate.py#L215-L267 and https://github.com/pytorch/ao/blob/378e6a8d6854d77efba45fcb1a4091724e9cfaa9/torchao/_models/llama/eval.py#L72-L180

Note: lists are not exactly the same, so we can have a function that have a list of methods enabled probably

jerryzh168 commented 1 month ago

maybe not always a net positive since we only have 2, but could be useful if it's used by other libraries as well, like in https://github.com/sgl-project/sglang/blob/f202ed97121a42fbc960572fa953101f584f17d4/python/sglang/srt/layers/torchao_utils.py#L10