intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
245 stars 20 forks source link

[Low priority]auto_round:triton backend has bug at inference #216

Closed wenhuach21 closed 2 months ago

wenhuach21 commented 2 months ago

File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in return lambda *args, *kwargs: self.run(grid=grid, warmup=False, args, kwargs) File "/home/wenhuach/auto-round/auto_round_extension/cuda/triton_utils/custom_autotune.py", line 131, in run key = tuple([2 int(math.log2(x) + 0.5) for x in key]) File "/home/wenhuach/auto-round/auto_round_extension/cuda/triton_utils/custom_autotune.py", line 131, in key = tuple([2 ** int(math.log2(x) + 0.5) for x in key]) ValueError: math domain error

wenhuach21 commented 2 months ago

change to v2, will reopen it if it still remains